from:"Mahender Sarangam"

Delta Logic in Spark

2018-11-17 Thread Mahender Sarangam

Hi,

We have daily data pull which pulls almost 50 GB of data from upstream system. 
We are using Spark SQL for processing of 50 GB. Finally insert 50 GB of data 
into Hive Target table and Now we are copying whole hive target table to SQL 
esp. SQL Staging Table & implement merge from staging SQL table against final 
SQL target table and insert only modified or new records in SQL Target table. 
Since this process is time consuming due to majority of time vested in copying 
data from Blob to SQL . Instead of copying whole set of data from cluster to 
SQL Server & implementing merge logic in SQL . We would likes to do Merge logic 
implementation in Spark SQL and Move the same Delta difference to SQL and Merge 
against Final SQL Target Table. This will reduce Network & I/O cost. As any one 
implementing DELTA difference in Spark / SPark SQL

Re: Is there way to purge logs

2018-10-01 Thread Mahender Sarangam

Gentle ping. Any idea of dealing this kind of scenarios.

On 9/16/2018 10:35 AM, Tharun M wrote:
Hi,
We are also facing the same issue. /user/hive/warehouse always reaches hard 
quota and jobs fail. Often we reachout to users to delete old tables/db’s. Is 
there a good way to handle this at enterprise level ( 100’s of users and 1000’s 
of databases)?

On Sun, Sep 16, 2018 at 00:31 Mahender Sarangam 
mailto:mahender.bigd...@outlook.com>> wrote:
Hi,

Our storage holding  TB of \User folder data. it has users and their logs. is 
there a way to set limit or quota and automatically clean up folder if it 
exceeds beyond certain limit.


$ sudo -u hdfs hdfs dfsadmin -setSpaceQuota 10g /user

I know above command sets the limit. But is there better way to do cleanup.

Is there way to purge logs

2018-09-16 Thread Mahender Sarangam

Hi,

Our storage holding  TB of \User folder data. it has users and their logs. is 
there a way to set limit or quota and automatically clean up folder if it 
exceeds beyond certain limit.


$ sudo -u hdfs hdfs dfsadmin -setSpaceQuota 10g /user

I know above command sets the limit. But is there better way to do cleanup.

Internal table stored NULL as \N. How to remove it

2018-06-23 Thread Mahender Sarangam

Hi,
We are storing our final transformed data in Hive table in JSON format. while 
storing data into table, all the null fields are converted into \\N. while 
reading table, we are seeing \\N instead of NULL. We tried setting

ALTER TABLE sample set SERDEPROPERTIES ('serialization.null.format' = "\N");
ALTER TABLE sample set TBLPROPERTIES ('serialization.null.format' = "\N");

But it didn't work, is there any better approach which out any reloading of 
data into table.
We cannot use  regexp_replace for 100 columns of table while querying data.

Re: drop partitions

2018-06-23 Thread Mahender Sarangam

Thanks Sajid and Alan . I got right syntax. thanks for right pointer.

From: Alan Gates 
Sent: Tuesday, June 19, 2018 12:26 AM
To: user@hive.apache.org
Subject: Re: drop partitions

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-DropPartitions

Alan.

On Sat, Jun 16, 2018 at 8:03 PM Mahender Sarangam 
mailto:mahender.bigd...@outlook.com>> wrote:
Hi All,

What is right syntax for dropping the partitions. Alter table drop if
exists partition(date >'date1'),partition(date <'date2')  or Alter table
drop if exists partition(date >'date1', date <'date2') ?

Mahens

drop partitions

2018-06-16 Thread Mahender Sarangam

Hi All,

What is right syntax for dropping the partitions. Alter table drop if 
exists partition(date >'date1'),partition(date <'date2')  or Alter table 
drop if exists partition(date >'date1', date <'date2') ?


Mahens

Hive External Table on particular set of files.

2018-06-03 Thread Mahender Sarangam

We are copying files from our upstream system which are in JSON GZ format. They 
are following a pattern for very daily slice say MMDDHH (2018053100) they 
are maintianing two folders DATA and METADATA. Where DATA holds actual data and 
METADATA holds RowCount of that day's data.We need to create external table on 
top of copied data, where it only consider *.json.gz extension files only, 
excluding other file extensions. We dont want to copy files into another 
location since they are large in size. We also tried INPUT_ FILENAME  
virtual column, it didn't work. Any suggesstion for this scnearios ?

Re: Hive External Table with Zero Bytes files

2018-05-08 Thread Mahender Sarangam

Thanks Nishanth. We are also cleaning up files of size Zero byes

On 5/2/2018 8:53 AM, Nishanth S wrote:
I have run into similar issue with avro files . The solution was to fix
upstream jobs that were writing data to those directories . In our case the
writers were not flushed/closed correctly during certain events which caused
the issue . Fixing those prevented these 0 sized files.

-NS

On Wed, May 2, 2018 at 1:52 AM, Mahender Sarangam
mailto:mahender.bigd...@outlook.com>> wrote:

ping..

On 5/1/2018 3:57 AM, Mahender Sarangam wrote:

Thanks Thai. I have mentioned wrongly Folder Name, it 's same DAY=20180325
(Folder) and same has Filename. actually in our upstream, our source table is
partitioned by Date. Whenever a table is partitioned, we see Zero Byte. Now
when we create external table with partitioned by columns and fire select query
no data is returned. . If I delete manually those files (Zero Bytes), we were
able to read.

/Mahender

On 4/28/2018 6:36 AM, Thai Bui wrote:
Your external table is referencing the .../day=201803250 location which is
empty. Point your table to the capital .../DAY=201803250 and you should be able
to read the data there.

Also, it looks like you want external partitioned table. You’ll need to create
an external table with a partition clause, then alter the table and add
partition for each of the ../DAY=someday path that you have.

On Sat, Apr 28, 2018 at 4:05 AM Mahender Sarangam
mailto:mahender.bigd...@outlook.com>> wrote:

Gentle Ping. Please help me on below issue. Has any one faced same issue

On 4/27/2018 1:28 AM, Mahender Sarangam wrote:

Hi,

Can any one faced issue while fetching data from external table. We are copying
data from upstream system into our storage S3. As part of copy, directories
along with Zero bytes files are been copied. Source File Format is in JSON
format. Below is Folder Hierarchy Structure

DATE -->

---> Folder

1.json.gz --> File

2.json.gz

---> Empty Zero Bytes Files.

Please find below screenshot

[cid:part3.EBF43959.0CBADC68@outlook.com]

We are trying to create external table with JSON Serde.

ADD JAR
wasb://jsonse...@xyz.blob.core.windows.net/json/json-serde-1.3.9.jar<mailto:wasb://jsonse...@xyz.blob.core.windows.net/json/json-serde-1.3.9.jar>;
SET hive.mapred.supports.subdirectories=TRUE;
SET mapred.input.dir.recursive=TRUE;
SET hive.merge.mapfiles = true;
SET hive.merge.mapredfiles = true;
SET hive.merge.tezfiles = true;

DROP TABLE IF EXISTS Ext_STG1;
CREATE EXTERNAL TABLE Ext_STG1(Col1 String, Col2 String, Col3 String) ROW
FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' WITH SERDEPROPERTIES
("case.insensitive" = "true", "ignore.malformed.json" = "true")
STORED AS TEXTFILE LOCATION
'wasb://contain...@xyz.blob.core.windows.net/date/day=201803250/<mailto:wasb://contain...@xyz.blob.core.windows.net/date/day=201803250/>'
TBLPROPERTIES ('serialization.null.format' = '');

select * from Ext_STG1 limit 100;

Above Query shows Empty Results.

When I delete Zero bytes files, then i could see data from select external
table. Is this expected behaviour. Is there any setting for ignoring Zero bytes
files in hive external table

-Mahens

--
Thai

Re: Hive External Table with Zero Bytes files

2018-05-02 Thread Mahender Sarangam

ping..

On 5/1/2018 3:57 AM, Mahender Sarangam wrote:

Thanks Thai. I have mentioned wrongly Folder Name, it 's same DAY=20180325 
(Folder) and same has Filename. actually in our upstream, our source table is 
partitioned by Date. Whenever a table is partitioned, we see Zero Byte. Now 
when we create external table with partitioned by columns and fire select query 
no data is returned. . If I delete manually those files (Zero Bytes), we were 
able to read.


/Mahender

On 4/28/2018 6:36 AM, Thai Bui wrote:
Your external table is referencing the .../day=201803250 location which is 
empty. Point your table to the capital .../DAY=201803250 and you should be able 
to read the data there.

Also, it looks like you want external partitioned table. You’ll need to create 
an external table with a partition clause, then alter the table and add 
partition for each of the ../DAY=someday path that you have.

On Sat, Apr 28, 2018 at 4:05 AM Mahender Sarangam 
mailto:mahender.bigd...@outlook.com>> wrote:

Gentle Ping. Please help me on below issue. Has any one faced same issue

On 4/27/2018 1:28 AM, Mahender Sarangam wrote:

Hi,

Can any one faced issue while fetching data from external table. We are copying 
data from upstream system into our storage S3. As part of copy, directories 
along with Zero bytes files are been copied. Source File Format is in JSON 
format.  Below is Folder Hierarchy Structure


 DATE  -->  

---> Folder

 1.json.gz  --> File

  2.json.gz

 ---> Empty Zero Bytes Files.

Please find below screenshot

[cid:part2.03F0F4D8.6DB0963A@outlook.com]

We are trying to create external table with JSON Serde.

ADD JAR 
wasb://jsonse...@xyz.blob.core.windows.net/json/json-serde-1.3.9.jar<mailto:wasb://jsonse...@xyz.blob.core.windows.net/json/json-serde-1.3.9.jar>;
 SET hive.mapred.supports.subdirectories=TRUE;
 SET mapred.input.dir.recursive=TRUE;
SET hive.merge.mapfiles = true;
SET hive.merge.mapredfiles = true;
SET hive.merge.tezfiles = true;


 DROP TABLE IF EXISTS Ext_STG1;
 CREATE EXTERNAL TABLE Ext_STG1(Col1 String, Col2 String, Col3 String) ROW 
FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' WITH SERDEPROPERTIES 
("case.insensitive" = "true", "ignore.malformed.json" = "true")
STORED AS TEXTFILE LOCATION 
'wasb://contain...@xyz.blob.core.windows.net/date/day=201803250/<mailto:wasb://contain...@xyz.blob.core.windows.net/date/day=201803250/>'
 TBLPROPERTIES ('serialization.null.format' = '');

select * from Ext_STG1 limit 100;


Above Query shows Empty Results.


When I delete Zero bytes files, then i could see data from select external 
table. Is this expected behaviour. Is there any setting for ignoring Zero bytes 
files in hive external table


-Mahens

--
Thai

Re: Hive External Table with Zero Bytes files

2018-05-01 Thread Mahender Sarangam

Thanks Thai. I have mentioned wrongly Folder Name, it 's same DAY=20180325 
(Folder) and same has Filename. actually in our upstream, our source table is 
partitioned by Date. Whenever a table is partitioned, we see Zero Byte. Now 
when we create external table with partitioned by columns and fire select query 
no data is returned. . If I delete manually those files (Zero Bytes), we were 
able to read.


/Mahender

On 4/28/2018 6:36 AM, Thai Bui wrote:
Your external table is referencing the .../day=201803250 location which is 
empty. Point your table to the capital .../DAY=201803250 and you should be able 
to read the data there.

Also, it looks like you want external partitioned table. You’ll need to create 
an external table with a partition clause, then alter the table and add 
partition for each of the ../DAY=someday path that you have.

On Sat, Apr 28, 2018 at 4:05 AM Mahender Sarangam 
mailto:mahender.bigd...@outlook.com>> wrote:

Gentle Ping. Please help me on below issue. Has any one faced same issue

On 4/27/2018 1:28 AM, Mahender Sarangam wrote:

Hi,

Can any one faced issue while fetching data from external table. We are copying 
data from upstream system into our storage S3. As part of copy, directories 
along with Zero bytes files are been copied. Source File Format is in JSON 
format.  Below is Folder Hierarchy Structure


 DATE  -->  

---> Folder

 1.json.gz  --> File

  2.json.gz

 ---> Empty Zero Bytes Files.

Please find below screenshot

[cid:part2.8F1EE022.4DF8F8CE@outlook.com]

We are trying to create external table with JSON Serde.

ADD JAR 
wasb://jsonse...@xyz.blob.core.windows.net/json/json-serde-1.3.9.jar<mailto:wasb://jsonse...@xyz.blob.core.windows.net/json/json-serde-1.3.9.jar>;
 SET hive.mapred.supports.subdirectories=TRUE;
 SET mapred.input.dir.recursive=TRUE;
SET hive.merge.mapfiles = true;
SET hive.merge.mapredfiles = true;
SET hive.merge.tezfiles = true;


 DROP TABLE IF EXISTS Ext_STG1;
 CREATE EXTERNAL TABLE Ext_STG1(Col1 String, Col2 String, Col3 String) ROW 
FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' WITH SERDEPROPERTIES 
("case.insensitive" = "true", "ignore.malformed.json" = "true")
STORED AS TEXTFILE LOCATION 
'wasb://contain...@xyz.blob.core.windows.net/date/day=201803250/<mailto:wasb://contain...@xyz.blob.core.windows.net/date/day=201803250/>'
 TBLPROPERTIES ('serialization.null.format' = '');

select * from Ext_STG1 limit 100;


Above Query shows Empty Results.


When I delete Zero bytes files, then i could see data from select external 
table. Is this expected behaviour. Is there any setting for ignoring Zero bytes 
files in hive external table


-Mahens

--
Thai

Re: Hive External Table with Zero Bytes files

2018-04-28 Thread Mahender Sarangam

Gentle Ping. Please help me on below issue. Has any one faced same issue

On 4/27/2018 1:28 AM, Mahender Sarangam wrote:

Hi,

Can any one faced issue while fetching data from external table. We are copying 
data from upstream system into our storage S3. As part of copy, directories 
along with Zero bytes files are been copied. Source File Format is in JSON 
format.  Below is Folder Hierarchy Structure


 DATE  -->  

---> Folder

 1.json.gz  --> File

  2.json.gz

 ---> Empty Zero Bytes Files.

Please find below screenshot

[cid:part1.4FF9D08A.0847F3AF@outlook.com]

We are trying to create external table with JSON Serde.

ADD JAR 
wasb://jsonse...@xyz.blob.core.windows.net/json/json-serde-1.3.9.jar<mailto:wasb://jsonse...@xyz.blob.core.windows.net/json/json-serde-1.3.9.jar>;
 SET hive.mapred.supports.subdirectories=TRUE;
 SET mapred.input.dir.recursive=TRUE;
SET hive.merge.mapfiles = true;
SET hive.merge.mapredfiles = true;
SET hive.merge.tezfiles = true;


 DROP TABLE IF EXISTS Ext_STG1;
 CREATE EXTERNAL TABLE Ext_STG1(Col1 String, Col2 String, Col3 String) ROW 
FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' WITH SERDEPROPERTIES 
("case.insensitive" = "true", "ignore.malformed.json" = "true")
STORED AS TEXTFILE LOCATION 
'wasb://contain...@xyz.blob.core.windows.net/date/day=201803250/<mailto:wasb://contain...@xyz.blob.core.windows.net/date/day=201803250/>'
 TBLPROPERTIES ('serialization.null.format' = '');

select * from Ext_STG1 limit 100;


Above Query shows Empty Results.


When I delete Zero bytes files, then i could see data from select external 
table. Is this expected behaviour. Is there any setting for ignoring Zero bytes 
files in hive external table


-Mahens

Hive External Table with Zero Bytes files

2018-04-27 Thread Mahender Sarangam

Hi,

Can any one faced issue while fetching data from external table. We are copying 
data from upstream system into our storage S3. As part of copy, directories 
along with Zero bytes files are been copied. Source File Format is in JSON 
format.  Below is Folder Hierarchy Structure


 DATE  -->  

---> Folder

 1.json.gz  --> File

  2.json.gz

 ---> Empty Zero Bytes Files.

Please find below screenshot

[cid:part1.0D2FE6BE.F20BC8BF@outlook.com]

We are trying to create external table with JSON Serde.

ADD JAR 
wasb://jsonse...@xyz.blob.core.windows.net/json/json-serde-1.3.9.jar;
 SET hive.mapred.supports.subdirectories=TRUE;
 SET mapred.input.dir.recursive=TRUE;
SET hive.merge.mapfiles = true;
SET hive.merge.mapredfiles = true;
SET hive.merge.tezfiles = true;


 DROP TABLE IF EXISTS Ext_STG1;
 CREATE EXTERNAL TABLE Ext_STG1(Col1 String, Col2 String, Col3 String) ROW 
FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' WITH SERDEPROPERTIES 
("case.insensitive" = "true", "ignore.malformed.json" = "true")
STORED AS TEXTFILE LOCATION 
'wasb://contain...@xyz.blob.core.windows.net/date/day=201803250/'
 TBLPROPERTIES ('serialization.null.format' = '');

select * from Ext_STG1 limit 100;


Above Query shows Empty Results.


When I delete Zero bytes files, then i could see data from select external 
table. Is this expected behaviour. Is there any setting for ignoring Zero bytes 
files in hive external table


-Mahens

Need to read JSON File

2018-04-22 Thread Mahender Sarangam

Hi,

we have to read Gz compressed JSON File from Source System. I see they are 3 
different ways of reading JSON data.  Our data doesn't have nesting at all. We 
see one option

  * get_json_object and
  *   Another is get_json_tuple.

I heard they time consuming library and whereas third is  
org.openx.data.jsonserde, I could find where i can download this JAR File. I 
heard this(org.openx.data.jsonserde) serde is fast. Can any one provide 
location where i can download this Serde. Or there any disadvantages of using 
org.openx.data.jsonserde

also see in my Hive Version 1.2, there is Library called  Json-20090211.jar. Is 
this for JSON parsing

Building Datwarehouse Application in Spark

2018-04-04 Thread Mahender Sarangam

Hi,
Does anyone has good architecture document/design principle for building 
warehouse application using Spark.

Is it better way of having Hive Context created with HQL and perform 
transformation or Directly loading  files in dataframe and perform data 
transformation.

We need to implement SCD 2 Type in Spark, Is there any better 
document/reference for building Type 2 warehouse object

Thanks in advace

/Mahender

Re: Yarn Queue Capacity Schedule

2017-03-22 Thread Mahender Sarangam

Ping..

On 3/17/2017 3:37 PM, Mahender Sarangam wrote:

Hi,

We have configured our Capacity Scheduler Queues has per 
link.<https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_performance_tuning/content/section_create_configure_yarn_capacity_scheduler_queues.html>

We have two Queues Q1 and Q2 each with 50% of cluster resources. We submit 
query to Hive through HiveServer2 and WEBHCAT (Templeton). When i submit my 
query to HiveServer2, it is making use of Q1 Queue capacity. Is there a way or 
some setting which makes query submitted through WEBHCAT goes to Q2 queue 
capacity only. Our is there any command like CURL which can accept parameter to 
which queue the query needs to be submitted.  because we are seeing  one big 
query block others.. how to improve concurrency?

Yarn Queue Capacity Schedule

2017-03-17 Thread Mahender Sarangam

Hi,

We have configured our Capacity Scheduler Queues has per 
link.

We have two Queues Q1 and Q2 each with 50% of cluster resources. We submit 
query to Hive through HiveServer2 and WEBHCAT (Templeton). When i submit my 
query to HiveServer2, it is making use of Q1 Queue capacity. Is there a way or 
some setting which makes query submitted through WEBHCAT goes to Q2 queue 
capacity only. Our is there any command like CURL which can accept parameter to 
which queue the query needs to be submitted.  because we are seeing  one big 
query block others.. how to improve concurrency?

Re: Hive Tez on External Table running on Single Mapper

2017-01-29 Thread Mahender Sarangam

ping..


On 1/24/2017 10:37 AM, Mahender Sarangam wrote:
> Here are the table properties
>
> TBLPROPERTIES (
> 'numFiles'='1',
> 'serialization.null.format'='',
> 'skip.header.line.count'='1',
> 'totalSize'='20971513935',
> 'transient_lastDdlTime'='1485091440');
>
>
> On 1/24/2017 10:27 AM, Mahender Sarangam wrote:
>> When i see properties of external table, STORED AS INPUTFORMAT
>> 'org.apache.hadoop.mapred.TextInputFormat'
>>
>>
>> On 1/23/2017 2:25 PM, Gopal Vijayaraghavan wrote:
>>>> We have 20 GB txt File, When we have created external table on top of 20
>>>> Gb file, we see Tez is creating only one mapper.
>>> For an uncompressed file, that is very strange. Is this created as "STORED 
>>> AS TEXTFILE" or some other strange format?
>>>
>>> Cheers,
>>> Gopal
>>>
>>>
>>>

Re: Hive Tez on External Table running on Single Mapper

2017-01-24 Thread Mahender Sarangam

Here are the table properties

TBLPROPERTIES (
   'numFiles'='1',
   'serialization.null.format'='',
   'skip.header.line.count'='1',
   'totalSize'='20971513935',
   'transient_lastDdlTime'='1485091440');


On 1/24/2017 10:27 AM, Mahender Sarangam wrote:
> When i see properties of external table, STORED AS INPUTFORMAT
> 'org.apache.hadoop.mapred.TextInputFormat'
>
>
> On 1/23/2017 2:25 PM, Gopal Vijayaraghavan wrote:
>>> We have 20 GB txt File, When we have created external table on top of 20
>>>Gb file, we see Tez is creating only one mapper.
>> For an uncompressed file, that is very strange. Is this created as "STORED 
>> AS TEXTFILE" or some other strange format?
>>
>> Cheers,
>> Gopal
>>
>>
>>

Re: Hive Tez on External Table running on Single Mapper

2017-01-24 Thread Mahender Sarangam

Here is the screenshot

[cid:part1.CDBA25BE.CDE6CB4A@outlook.com]

On 1/24/2017 10:27 AM, Mahender Sarangam wrote:

When i see properties of external table, STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'


On 1/23/2017 2:25 PM, Gopal Vijayaraghavan wrote:


We have 20 GB txt File, When we have created external table on top of 20
  Gb file, we see Tez is creating only one mapper.


For an uncompressed file, that is very strange. Is this created as "STORED AS 
TEXTFILE" or some other strange format?

Cheers,
Gopal

Re: Hive Tez on External Table running on Single Mapper

2017-01-24 Thread Mahender Sarangam

When i see properties of external table, STORED AS INPUTFORMAT 
'org.apache.hadoop.mapred.TextInputFormat'


On 1/23/2017 2:25 PM, Gopal Vijayaraghavan wrote:
>> We have 20 GB txt File, When we have created external table on top of 20
>>   Gb file, we see Tez is creating only one mapper.
> For an uncompressed file, that is very strange. Is this created as "STORED AS 
> TEXTFILE" or some other strange format?
>
> Cheers,
> Gopal
>
>
>

Hive Tez on External Table running on Single Mapper

2017-01-23 Thread Mahender Sarangam

Hi,

We have 20 GB txt File, When we have created external table on top of 20 
Gb file, we see Tez is creating only one mapper. We have applied setting 
like reducing distribution block size less 128 MB. Manually Set Mappers 
and Reducer but of no use. We are using Tez 0.7 version, is there any 
setting which makes application to run in more than one mapper. Input 
file is not Compressed, it's a Txt File with 100 columns.

Re: Hive ORC Table

2017-01-21 Thread Mahender Sarangam

Yes below option, i tried it, But I'm not sure about work load (data 
ingestion). I cant go with fixed hard coded value,I would like to know reason 
for getting 1009 reducer task.

On 1/20/2017 7:45 PM, goun na wrote:
Hi Mahender ,

1st :
Didn't work the following option in Tez?

set mapreduce.job.reduces=100
or
set mapred.reduce.tasks=100 (deprecated)

2nd :
Possibility of data skew. It happens when handling null sometimes.

Goun


2017-01-21 9:58 GMT+09:00 Mahender Sarangam 
mailto:mahender.bigd...@outlook.com>>:
Hi All,

We have ORC table which is of 2 GB size. When we perform operation on
top of this ORC table, Tez always deduce 1009 reducer every time. I
searched 1009 is considered as Maximum value of number of Tez task. Is
there a way to reduce the number of reducer. I see file generated
underlying ORC some of them 500 MB or 1 GB etc. Is there way to
distribute file size to same value/same size.


My Second scenario, we have join on 5 tables all of them are left join.
Query goes fast till reached 99%. From 99% to 100% it takes too much
time. We are not involving our partition column as part of LEFT JOIN
Statement, Is there better way to resolving issues on 99% hanging
condition. My table is of 20 GB we are left joining with another table (
9,00,00,000) records.


Mahens

Hive ORC Table

2017-01-20 Thread Mahender Sarangam

Hi All,

We have ORC table which is of 2 GB size. When we perform operation on 
top of this ORC table, Tez always deduce 1009 reducer every time. I 
searched 1009 is considered as Maximum value of number of Tez task. Is 
there a way to reduce the number of reducer. I see file generated 
underlying ORC some of them 500 MB or 1 GB etc. Is there way to 
distribute file size to same value/same size.


My Second scenario, we have join on 5 tables all of them are left join. 
Query goes fast till reached 99%. From 99% to 100% it takes too much 
time. We are not involving our partition column as part of LEFT JOIN 
Statement, Is there better way to resolving issues on 99% hanging 
condition. My table is of 20 GB we are left joining with another table ( 
9,00,00,000) records. Is there better trouble shooting technique for 
identifying root of issue for 99% and how to improve the performance.


Mahens

Hive ORC Table

2017-01-20 Thread Mahender Sarangam

Hi All,

We have ORC table which is of 2 GB size. When we perform operation on 
top of this ORC table, Tez always deduce 1009 reducer every time. I 
searched 1009 is considered as Maximum value of number of Tez task. Is 
there a way to reduce the number of reducer. I see file generated 
underlying ORC some of them 500 MB or 1 GB etc. Is there way to 
distribute file size to same value/same size.


My Second scenario, we have join on 5 tables all of them are left join. 
Query goes fast till reached 99%. From 99% to 100% it takes too much 
time. We are not involving our partition column as part of LEFT JOIN 
Statement, Is there better way to resolving issues on 99% hanging 
condition. My table is of 20 GB we are left joining with another table ( 
9,00,00,000) records.


Mahens

Re: DateFunction

2017-01-17 Thread Mahender Sarangam

We are using Hive 1.2.1, it is working. thnq

On 1/16/2017 7:40 AM, Devopam Mittra wrote:
hi Mahender,

I don't know your version of Hive .
Please try :
date_format(curren_date,'M')

regards
Dev


On Mon, Jan 16, 2017 at 6:56 PM, Jitendra Yadav 
mailto:jeetuyadav200...@gmail.com>> wrote:
Ref: 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions.


int


month(string date)


Returns the month part of a date or a timestamp string: month("1970-11-01 
00:00:00") = 11, month("1970-11-01") = 11.


Does it fit in your requirement?.

Thanks

On Mon, Jan 16, 2017 at 12:21 PM, Mahender Sarangam 
mailto:mahender.bigd...@outlook.com>> wrote:
Hi,

Is there any Date Function which returns Full Month Name for given time
stamp.





--
Devopam Mittra
Life and Relations are not binary

Re: DateFunction

2017-01-17 Thread Mahender Sarangam

Hi Jitendra,

Wr are actually looking for full month name not just month number

On 1/16/2017 5:26 AM, Jitendra Yadav wrote:

month(string date)

Fwd: Support of Theta Join

2017-01-16 Thread Mahender Sarangam

   Is there any support of Theta Join in Spark. We have a requirement to 
identify the country name based on Range of IP Address in a table.

 Forwarded Message 
Subject:Support of Theta Join
Date:   Thu, 12 Jan 2017 15:19:51 +
From:   Mahender Sarangam 
<mailto:mahender.bigd...@outlook.com>
To: user <mailto:u...@spark.apache.org>

Hi All,

Is there any support of theta join in SPARK. We want to identify the
country based on range on IP Address (we have in our DB)

-
To unsubscribe e-mail: 
user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org>

DateFunction

2017-01-16 Thread Mahender Sarangam

Hi,

Is there any Date Function which returns Full Month Name for given time 
stamp.

Re: Zero Bytes Files importance

2017-01-10 Thread Mahender Sarangam

Thanks Gopal for providing detail explanation.


On 1/3/2017 5:59 PM, Gopal Vijayaraghavan wrote:
>> Thanks Gopal. Yeah I'm using CloudBerry.  Storage is Azure.
> Makes sense, only an object store would have this.
>
>> Are you saying this _0,1,2,3 are directories ?.
> No, only the zero size "files".
>
> This is really for compat with regular filesystems.
>
> If you have /tmp/1/foo in an object store that's a single key. That does not 
> imply you'll find "/tmp" or "/tmp/1" in the object store keys.
>
> A FileSystem however assumes parent directories are "real things", so any 
> FileSystem abstraction has to maintain "/tmp", "/tmp/1/" and "/tmp/1/foo" to 
> keep up the basic compatibility requirements of fs.exists("/tmp").
>
> Cheers,
> Gopal
>
>

Re: Zero Bytes Files importance

2017-01-10 Thread Mahender Sarangam

Thanks Gopal for providing detail explanation.


On 1/3/2017 5:59 PM, Gopal Vijayaraghavan wrote:
>> Thanks Gopal. Yeah I'm using CloudBerry.  Storage is Azure.
> Makes sense, only an object store would have this.
>
>> Are you saying this _0,1,2,3 are directories ?.
> No, only the zero size "files".
>
> This is really for compat with regular filesystems.
>
> If you have /tmp/1/foo in an object store that's a single key. That does not 
> imply you'll find "/tmp" or "/tmp/1" in the object store keys.
>
> A FileSystem however assumes parent directories are "real things", so any 
> FileSystem abstraction has to maintain "/tmp", "/tmp/1/" and "/tmp/1/foo" to 
> keep up the basic compatibility requirements of fs.exists("/tmp").
>
> Cheers,
> Gopal
>
>

Re: Query History

2017-01-10 Thread Mahender Sarangam

ping..


On 1/4/2017 4:27 PM, Mahender Sarangam wrote:
> Hi Team,
>
> Is there a way in Resource Manager UI or location in HDFS, where i can
> search for logs related to Application ID even after 1 week of
> execution. I can go into Ambari URI, Tez View  and Search By Application
> ID, Is there a way to know location on HDFS where Application ID logs
> like  DAGs, Number of reducers and mappers etc information stored.
>
> /Mahender
>

Accessing Yarn logs

2017-01-09 Thread Mahender Sarangam

Hi,

I'm trying to access Yarn logs from this location 
"/mnt/resource/hadoop/yarn/log/application_1483150142223_0059/container_e03_1483150142223_0059_01_000333/"
 but I could able to navigate only till "/mnt/resource" but i couldnot find 
hadoop folder inside.

I'm accessing root (/) folder is there any other location /mnt/resource/hadoop 
folder is present ?


Below is the screen shot where i could see only one folder "Swap", Is there 
anything im missing.

[cid:part1.894B2F98.CC35E6CE@outlook.com]

Query History

2017-01-04 Thread Mahender Sarangam

Hi Team,

Is there a way in Resource Manager UI or location in HDFS, where i can 
search for logs related to Application ID even after 1 week of 
execution. I can go into Ambari URI, Tez View  and Search By Application 
ID, Is there a way to know location on HDFS where Application ID logs 
like  DAGs, Number of reducers and mappers etc information stored.

/Mahender

Re: Zero Bytes Files importance

2017-01-03 Thread Mahender Sarangam

Thanks Gopal. Yeah I'm using CloudBerry.  Storage is Azure.

Are you saying this _0,1,2,3 are directories ?.

[cid:part1.C19351C7.1FC7FD04@outlook.com]


On 12/29/2016 11:18 AM, Gopal Vijayaraghavan wrote:




For any insert operation, there will be one Zero bytes file. I would like to 
know importance of this Zero bytes file.



They are directories.

I'm assuming you're using S3A + screenshots from something like Bucket explorer.

These directory entries will not be shown if you do something like "hadoop fs 
-ls s3a://…/"

I had a recent talk covering the specifics of S3 + Hive - 
https://www.slideshare.net/secret/3cfQbeo3cI6GpK/3

Cheers,
Gopal

Head Node with More Memory and less number cores

2017-01-02 Thread Mahender Sarangam

Hi,

I have question around cluster configuration, Is there any benefit of 
having Head node with more memory and less number of Cores. I have Data 
Nodes of Size 10 Nodes each of 4 Core and 14 GB RAM. Can any one throw 
me some light of having Head node with Big Memory. How exactly Head Node 
memory is indirectly responsible for Cluster Node Size.Will there be any 
performance improvement (I don't think so) or any resource outage issues 
get resolved by having more memory in head node.


Thnks,

Mahender

Re: Zero Bytes Files importance

2016-12-29 Thread Mahender Sarangam

Hi Sranislavs,

It is simple Insert statement with Select statements which are union together 
like below

Insert into TargetTable

Select C1,C2 from Table1

Union All

Select C1,C2 from Table2

3 reducers are been created so 3 folders are created.

For any insert operation, there will be one Zero bytes file. I would like to 
know importance of this Zero bytes file.





On 12/28/2016 11:35 AM, Staņislavs Rogozins wrote:
What are the actual queries used to create the table and insert data? Are you 
enforcing bucketing or partitioning?

On Wed, Dec 28, 2016 at 1:22 PM, Mahender Sarangam 
mailto:mahender.bigd...@outlook.com>> wrote:

ping.

On 12/27/2016 12:36 PM, Mahender Sarangam wrote:

   Hi

When we dump or have hive query to insert data into another hive table. There 
will be Folder which contains actual data and apart from that we see another 0 
Byte File present at same level. I would like to understand importance of Zero 
byte files. What happens if we delete this file. Please find below screen shot 
like 1,2,3 with Zero bytes files.

[cid:part2.2C94D91A.24E5B063@outlook.com]

/Mahender

Re: Zero Bytes Files importance

2016-12-28 Thread Mahender Sarangam

ping.

On 12/27/2016 12:36 PM, Mahender Sarangam wrote:

   Hi

When we dump or have hive query to insert data into another hive table. There 
will be Folder which contains actual data and apart from that we see another 0 
Byte File present at same level. I would like to understand importance of Zero 
byte files. What happens if we delete this file. Please find below screen shot 
like 1,2,3 with Zero bytes files.

[cid:part1.AFDE4D4A.385DEE68@outlook.com]

/Mahender

Zero Bytes Files importance

2016-12-27 Thread Mahender Sarangam

   Hi

When we dump or have hive query to insert data into another hive table. There 
will be Folder which contains actual data and apart from that we see another 0 
Byte File present at same level. I would like to understand importance of Zero 
byte files. What happens if we delete this file. Please find below screen shot 
like 1,2,3 with Zero bytes files.

[cid:part1.2CC86749.337B84DF@outlook.com]

/Mahender

Zero Bytes Files importance

2016-12-27 Thread Mahender Sarangam

   Hi

When we dump or have hive query to insert data into another hive table. There 
will be Folder which contains actual data and apart from that we see another 0 
Byte File present at same level. I would like to understand importance of Zero 
byte files. What happens if we delete this file.

[cid:part1.726DBDDD.87D45B69@outlook.com]

[cid:part2.0F4C161E.C55C3FE2@outlook.com]

/Mahender

Re: Anyway to avoid creating subdirectories by "Insert with union²

2016-12-27 Thread Mahender Sarangam

HI Gopal,

Another question which i have is whenever we run Union All statement, 
apart from Folders we also see Zero Bytes Files in HDFS. Are there locks 
file (LCK) ?

Mahender

On 2/24/2016 4:26 PM, Gopal Vijayaraghavan wrote:
>> SET mapred.input.dir.recursive=TRUE;
> ...
>> Can we set above setting as tblProperties or Hive Table properties.
> Not directly, those are MapReduce properties - they are not settable via
> Hive tables.
>
> That said, you can write your own SemanticAnalyzerHooks to do pretty much
> anything you want like that.
>
> You can use hooks to modify the job, after tables have been resolved.
>
>
> Ideally such a hook should not modify the plan (much), because it's too
> late to do it right.
>
> But I sometimes prototype Hive optimizer features as Hooks, like this one.
>
> https://github.com/t3rmin4t0r/captain-hook
>
>
> Cheers,
> Gopal
>
>

Re: How to Mount Node which is unhealthy state.

2016-12-26 Thread Mahender Sarangam

ping.

On 12/19/2016 12:13 PM, Mahender Sarangam wrote:
Hi,

Currently one of the node goes unhealthy state and another node is LOST . When 
we try to restart all services at lost/unhealthy node in ambari, below is the 
error. Is there any reason for  cluster node going into Unhealthy state. Please 
help me in setting back cluster normal state


* WARNING * WARNING * WARNING * WARNING * WARNING *
* WARNING * WARNING * WARNING * WARNING * WARNING *
* WARNING * WARNING * WARNING * WARNING * WARNING *
Directory /mnt/resource/hadoop/hdfs/data became unmounted from /mnt . Current 
mount point: / . Please ensure that mounts are healthy. If the mount change was 
intentional, you can update the contents of 
/var/lib/ambari-agent/data/datanode/dfs_data_dir_mount.hist.
* WARNING * WARNING * WARNING * WARNING * WARNING *
* WARNING * WARNING * WARNING * WARNING * WARNING *
* WARNING * WARNING * WARNING * WARNING * WARNING *

Traceback (most recent call last):
  File 
"/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/datanode.py",
 line 174, in 
DataNode().execute()
  File 
"/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py",
 line 280, in execute
method(env)
  File 
"/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/datanode.py",
 line 61, in start
datanode(action="start")
  File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", 
line 89, in thunk
return fn(*args, **kwargs)
  File 
"/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_datanode.py",
 line 68, in datanode
create_log_dir=True
  File 
"/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/utils.py",
 line 269, in service
Execute(daemon_cmd, not_if=process_id_exists_command, 
environment=hadoop_env_exports)
  File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", 
line 155, in __init__
self.env.run()
  File 
"/usr/lib/python2.6/site-packages/resource_management/core/environment.py", 
line 160, in run
self.run_action(resource, action)
  File 
"/usr/lib/python2.6/site-packages/resource_management/core/environment.py", 
line 124, in run_action
provider_action()
  File 
"/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py",
 line 262, in action_run
tries=self.resource.tries, try_sleep=self.resource.try_sleep)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", 
line 73, in inner
result = function(command, **kwargs)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", 
line 103, in checked_call
tries=tries, try_sleep=try_sleep, 
timeout_kill_strategy=timeout_kill_strategy)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", 
line 151, in _call_wrapper
result = _call(command, **kwargs_copy)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", 
line 304, in _call
raise Fail(err_msg)
resource_management.core.exceptions.Fail: Execution of 'ambari-sudo.sh su hdfs 
-l -s /bin/bash -c 'ulimit -c unlimited ;  
/usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh --config 
/usr/hdp/current/hadoop-client/conf start datanode'' returned 1. starting 
datanode, logging to 
/var/log/hadoop/hdfs/hadoop-hdfs-datanode-wn47-lxcluster.out

here is he stdout file


The hadoop conf dir /usr/hdp/current/hadoop-client/conf exists, will call 
conf-select on it for version 2.5.1.0-56
2016-12-19 20:02:59,311 - Checking if need to create versioned conf dir 
/etc/hadoop/2.5.1.0-56/0
2016-12-19 20:02:59,312 - call[('ambari-python-wrap', u'/usr/bin/conf-select', 
'create-conf-dir', '--package', 'hadoop', '--stack-version', '2.5.1.0-56', 
'--conf-version', '0')] {'logoutput': False, 'sudo': True, 'quiet': False, 
'stderr': -1}
2016-12-19 20:02:59,332 - call returned (1, '/etc/hadoop/2.5.1.0-56/0 exist 
already', '')
2016-12-19 20:02:59,332 - checked_call[('ambari-python-wrap', 
u'/usr/bin/conf-select', 'set-conf-dir', '--package', 'hadoop', 
'--stack-version', '2.5.1.0-56', '--conf-version', '0')] {'logoutput': False, 
'sudo': True, 'quiet': False}
2016-12-19 20:02:59,349 - checked_call returned (0, '')
2016-12-19 20:02:59,350 - Ensuring that hadoop has the correct symlink structure
2016-12-19 20:02:59,350 - Using hadoop conf dir: 
/usr/hdp/current/hadoop-client/co

predicate push down on hive join.

2016-12-23 Thread Mahender Sarangam

Hi,

We are doing Join on large tables and couple of Left Join on 3-4 tables with 
result of large table join. We have question, is it better to keep predicate 
along with JOIN condition or keep predicate in where condition. I was going 
through Apache site, found below context

Couldn't understand meaning Pushed and Not Pushed. Can any1 throw some light on 
it.

[cid:part1.59E0ADB9.301AE575@outlook.com]


  *   Another question, In the case of inner Join, Is it better to keep 
predicate condition part of JOIN ON Condition or in Where Condition. We have 
seen If we do Join and add predicate in where condition, it was taking too much 
of time. when we move predicate logic to JOIN ON condition, it is executing 
fast. both the tables are large. Is this expected ? Below is our Table Join 
Condition

Table2Detail T2
JOIN Table1Summary T1
ON T2.Nbr= T1.Nbr
AND T2.Year=T1.Year
AND T2.Month=T1.Month
AND  T1.Col1= T2.Col1
AND T2.Col2= T1.Col2
AND T1.Col2= 'XYZ'
AND T2.Col2= 'XYZ'


or

Table2Detail T2
JOIN Table1Summary T1
ON T2.Nbr= T1.Nbr
AND T2.Year=T1.Year
AND T2.Month=T1.Month
AND  T1.Col1= T2.Col1
AND T2.Col2= T1.Col2

Where T1.Col2= 'XYZ'  AND T2.Col2= 'XYZ'

How to Mount Node which is unhealthy state.

2016-12-19 Thread Mahender Sarangam

Hi,

Currently one of the node goes unhealthy state and another node is LOST . When 
we try to restart all services at lost/unhealthy node in ambari, below is the 
error. Is there any reason for  cluster node going into Unhealthy state. Please 
help me in setting back cluster normal state


* WARNING * WARNING * WARNING * WARNING * WARNING *
* WARNING * WARNING * WARNING * WARNING * WARNING *
* WARNING * WARNING * WARNING * WARNING * WARNING *
Directory /mnt/resource/hadoop/hdfs/data became unmounted from /mnt . Current 
mount point: / . Please ensure that mounts are healthy. If the mount change was 
intentional, you can update the contents of 
/var/lib/ambari-agent/data/datanode/dfs_data_dir_mount.hist.
* WARNING * WARNING * WARNING * WARNING * WARNING *
* WARNING * WARNING * WARNING * WARNING * WARNING *
* WARNING * WARNING * WARNING * WARNING * WARNING *

Traceback (most recent call last):
  File 
"/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/datanode.py",
 line 174, in 
DataNode().execute()
  File 
"/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py",
 line 280, in execute
method(env)
  File 
"/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/datanode.py",
 line 61, in start
datanode(action="start")
  File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", 
line 89, in thunk
return fn(*args, **kwargs)
  File 
"/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_datanode.py",
 line 68, in datanode
create_log_dir=True
  File 
"/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/utils.py",
 line 269, in service
Execute(daemon_cmd, not_if=process_id_exists_command, 
environment=hadoop_env_exports)
  File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", 
line 155, in __init__
self.env.run()
  File 
"/usr/lib/python2.6/site-packages/resource_management/core/environment.py", 
line 160, in run
self.run_action(resource, action)
  File 
"/usr/lib/python2.6/site-packages/resource_management/core/environment.py", 
line 124, in run_action
provider_action()
  File 
"/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py",
 line 262, in action_run
tries=self.resource.tries, try_sleep=self.resource.try_sleep)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", 
line 73, in inner
result = function(command, **kwargs)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", 
line 103, in checked_call
tries=tries, try_sleep=try_sleep, 
timeout_kill_strategy=timeout_kill_strategy)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", 
line 151, in _call_wrapper
result = _call(command, **kwargs_copy)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", 
line 304, in _call
raise Fail(err_msg)
resource_management.core.exceptions.Fail: Execution of 'ambari-sudo.sh su hdfs 
-l -s /bin/bash -c 'ulimit -c unlimited ;  
/usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh --config 
/usr/hdp/current/hadoop-client/conf start datanode'' returned 1. starting 
datanode, logging to 
/var/log/hadoop/hdfs/hadoop-hdfs-datanode-wn47-lxcluster.out

here is he stdout file


The hadoop conf dir /usr/hdp/current/hadoop-client/conf exists, will call 
conf-select on it for version 2.5.1.0-56
2016-12-19 20:02:59,311 - Checking if need to create versioned conf dir 
/etc/hadoop/2.5.1.0-56/0
2016-12-19 20:02:59,312 - call[('ambari-python-wrap', u'/usr/bin/conf-select', 
'create-conf-dir', '--package', 'hadoop', '--stack-version', '2.5.1.0-56', 
'--conf-version', '0')] {'logoutput': False, 'sudo': True, 'quiet': False, 
'stderr': -1}
2016-12-19 20:02:59,332 - call returned (1, '/etc/hadoop/2.5.1.0-56/0 exist 
already', '')
2016-12-19 20:02:59,332 - checked_call[('ambari-python-wrap', 
u'/usr/bin/conf-select', 'set-conf-dir', '--package', 'hadoop', 
'--stack-version', '2.5.1.0-56', '--conf-version', '0')] {'logoutput': False, 
'sudo': True, 'quiet': False}
2016-12-19 20:02:59,349 - checked_call returned (0, '')
2016-12-19 20:02:59,350 - Ensuring that hadoop has the correct symlink structure
2016-12-19 20:02:59,350 - Using hadoop conf dir: 
/usr/hdp/current/hadoop-client/conf
2016-12-19 20:02:59,508 - The hadoop conf dir 
/usr/hdp/current/hadoop-client/conf exists, will call conf-select on it for 
version 2.5.1.0-56
2016-12-19 20:02:59,510 - Checking if need to create versioned conf dir 
/etc/hadoop/2.5.1.0-56/0
2016-12-19 20:02:59,512 - call[('ambari-python-wrap', u'/usr/bin/conf-select', 
'create-conf-dir', '--package', 'hadoop', '--stack-version', '2.5.1.0-56', 
'--conf-version', '0')] {'logoutput': False, 'sudo': True, 'quiet': False, 
'stderr': -1}
2016-12-19 20:02:59,5

How to move tasks under reducer to Mapper phase

2016-12-09 Thread Mahender Sarangam

Hi,

We are performing left joining on 5-6 larger tables. We see job is hanging 
around 95%. All the mappers completed fast and some of the reducer are also 
completed fast. but some of reducer are hanging state because single task is 
running on large data. Below are the Mapper and Reducer captured.


[cid:part1.631DA5DF.D88D9A2C@outlook.com]

  *   Is there a way to move task running under Reducer phase to Mapper phase. 
I mean tweaking with memory settings or modifying the query to have more mapper 
tasks than reducer task.

  *   Is there a way to know what part of query is taken by task which is 
running for long time. or what amount of rows this task is running upon ( so 
that i can think of partition or alternate approach)
  *   Any other memory setting to resolve hanging issue. Below is our memory 
settings

SET hive.tez.container.size = -1;
SET hive.execution.engine=tez;
SET hive.mapjoin.hybridgrace.hashtable=FALSE;
SET hive.optimize.ppd=true;
SET hive.cbo.enable =true;
SET hive.compute.query.using.stats =true;
SET hive.exec.parallel=true;
SET hive.vectorized.execution.enabled=true;
SET hive.exec.dynamic.partition=true;
SET hive.exec.dynamic.partition.mode=nonstrict;
SET hive.auto.convert.join=false;
SET hive.auto.convert.join.noconditionaltask=false;
set hive.tez.java.opts = "-Xmx3481m";
set hive.tez.container.size = 4096;
--SET mapreduce.map.memory.mb=4096;
--SET mapreduce.map.java.opts = -Xmx3000M;
--SET mapreduce.reduce.memory.mb = 2048;
--SET mapreduce.reduce.java.opts = -Xmx1630M;
SET fs.block.size=67108864;


Thanks in advance


-Mahender

Re: ORC does not support type conversion from INT to STRING.

2016-07-19 Thread Mahender Sarangam

But we are using Hive 1.2 version

On 7/19/2016 12:43 PM, Mich Talebzadeh wrote:
in Hive 2,  I don't see this issue INSERT/SELECT from INT to String column!


Dr Mich Talebzadeh



LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw



http://talebzadehmich.wordpress.com


Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction.



On 19 July 2016 at 20:39, Mahender Sarangam 
mailto:mahender.bigd...@outlook.com>> wrote:


Thanks Matthew,

Currently we are in Hive 1.2 version only, Is there any setting like 
"hive.metastore.disallow.incompatible.col.type.changes=false;" in Hive 1.2 or 
any around apart for reloading entire table data.  For Quick workaround, we are 
reloading entire data.
Can you please share with us Jira for Schema Evolution.


@Mich : Currently we have only primitive types. But I'm also interested to know 
"how the behavior will be  in complex types"


/Mahender


On 7/18/2016 3:55 PM, Mich Talebzadeh wrote:
Hi Mathew,

In layman's term if I create the source ORC table column as INT and then create 
a target ORC table but that column has now been defined as STRING and do an 
INSERT/SELECT from source table how data is internally stored?

Is it implicitly converted into new format using CAST function or it is stored 
as is and just masked?

The version of Hive I am using is 2 and it works OK for primitive data types 
(insert/select from INT to String)

However, I believe Mahender is referring to Complex types?

Thanks




Dr Mich Talebzadeh



LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw



http://talebzadehmich.wordpress.com


Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction.



On 18 July 2016 at 22:31, Matthew McCline 
mailto:mmccl...@hortonworks.com>> wrote:


Hi Mahender,


Schema Evolution is available on the latest recent version of Hive.


For example, if you set 
hive.metastore.disallow.incompatible.col.type.changes=false; on master (i.e. 
hive2) it will support INT to STRING conversion.


If you need to remain on an older version, then you are out of luck.


Thanks,

Matt



From: Mahender Sarangam 
mailto:mahender.bigd...@outlook.com>>
Sent: Monday, July 18, 2016 1:59 PM
To: user@hive.apache.org<mailto:user@hive.apache.org>
Subject: Re: ORC does not support type conversion from INT to STRING.


Hi Mich,

Sorry for delay in responding. here is the scenario,

We have created new cluster  and we have moved all ORC File data into new 
cluster. We have re-created table pointing to ORC location. We have modified 
data type of ORC table from INT to String. From then onward, we were unable to 
fire select statement against this ORC table, hive keep throwing exception, 
"Orc table select. Unable to convert Int to String". Looks like it is bug in 
ORC table only. Where in we modify the datatype from int to string, is causing 
problem with ORC reading/select statement, it throws exceptio. Please let me 
know if there are any workaround for this scenario. Is this behavior expected 
previously also.


/Mahender





On 6/14/2016 11:47 AM, Mich Talebzadeh wrote:
you must excuse my ignorance

can you please elaborate on this as there seems something has gone wrong 
somewhere?


Dr Mich Talebzadeh



LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw



http://talebzadehmich.wordpress.com<http://talebzadehmich.wordpress.com/>



On 14 June 2016 at 19:42, Mahender Sarangam 
mailto:mahender.bigd...@outlook.com>> wrote:

Yes Mich. We have restored cluster from metastore.

On 6/14/2016 11:35 AM, Mich Talebzadeh wrote:
Hi Mahendar,


Did you load the meta-data DB/schema from backup and now seeing this error




Dr Mich Talebzadeh



LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw



http://talebzadehmich.wordpress.com



On 14 June 2016 at 19:04, Mahender Sarangam 
mailto:mahender.bigd...@outlook.com>> wrote:

ping.

On 6/13/2016 1:19 PM, Mahender Sarangam wrote:

Hi,

We are facing issue while reading data from ORC table. We have created ORC 
table and dumped data into it. We have deleted cluster due to some reason. When 
we recreated cluster (using Metastore) and table pointing to same location. 
When we perform reading from ORC table. We see below err

Re: ORC does not support type conversion from INT to STRING.

2016-07-19 Thread Mahender Sarangam


Thanks Matthew,

Currently we are in Hive 1.2 version only, Is there any setting like 
"hive.metastore.disallow.incompatible.col.type.changes=false;" in Hive 1.2 or 
any around apart for reloading entire table data.  For Quick workaround, we are 
reloading entire data.
Can you please share with us Jira for Schema Evolution.


@Mich : Currently we have only primitive types. But I'm also interested to know 
"how the behavior will be  in complex types"


/Mahender


On 7/18/2016 3:55 PM, Mich Talebzadeh wrote:
Hi Mathew,

In layman's term if I create the source ORC table column as INT and then create 
a target ORC table but that column has now been defined as STRING and do an 
INSERT/SELECT from source table how data is internally stored?

Is it implicitly converted into new format using CAST function or it is stored 
as is and just masked?

The version of Hive I am using is 2 and it works OK for primitive data types 
(insert/select from INT to String)

However, I believe Mahender is referring to Complex types?

Thanks




Dr Mich Talebzadeh



LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw



http://talebzadehmich.wordpress.com


Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction.



On 18 July 2016 at 22:31, Matthew McCline 
mailto:mmccl...@hortonworks.com>> wrote:


Hi Mahender,


Schema Evolution is available on the latest recent version of Hive.


For example, if you set 
hive.metastore.disallow.incompatible.col.type.changes=false; on master (i.e. 
hive2) it will support INT to STRING conversion.


If you need to remain on an older version, then you are out of luck.


Thanks,

Matt


____
From: Mahender Sarangam 
mailto:mahender.bigd...@outlook.com>>
Sent: Monday, July 18, 2016 1:59 PM
To: user@hive.apache.org<mailto:user@hive.apache.org>
Subject: Re: ORC does not support type conversion from INT to STRING.


Hi Mich,

Sorry for delay in responding. here is the scenario,

We have created new cluster  and we have moved all ORC File data into new 
cluster. We have re-created table pointing to ORC location. We have modified 
data type of ORC table from INT to String. From then onward, we were unable to 
fire select statement against this ORC table, hive keep throwing exception, 
"Orc table select. Unable to convert Int to String". Looks like it is bug in 
ORC table only. Where in we modify the datatype from int to string, is causing 
problem with ORC reading/select statement, it throws exceptio. Please let me 
know if there are any workaround for this scenario. Is this behavior expected 
previously also.


/Mahender





On 6/14/2016 11:47 AM, Mich Talebzadeh wrote:
you must excuse my ignorance

can you please elaborate on this as there seems something has gone wrong 
somewhere?


Dr Mich Talebzadeh



LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw



http://talebzadehmich.wordpress.com<http://talebzadehmich.wordpress.com/>



On 14 June 2016 at 19:42, Mahender Sarangam 
mailto:mahender.bigd...@outlook.com>> wrote:

Yes Mich. We have restored cluster from metastore.

On 6/14/2016 11:35 AM, Mich Talebzadeh wrote:
Hi Mahendar,


Did you load the meta-data DB/schema from backup and now seeing this error




Dr Mich Talebzadeh



LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw



http://talebzadehmich.wordpress.com



On 14 June 2016 at 19:04, Mahender Sarangam 
mailto:mahender.bigd...@outlook.com>> wrote:

ping.

On 6/13/2016 1:19 PM, Mahender Sarangam wrote:

Hi,

We are facing issue while reading data from ORC table. We have created ORC 
table and dumped data into it. We have deleted cluster due to some reason. When 
we recreated cluster (using Metastore) and table pointing to same location. 
When we perform reading from ORC table. We see below error.

SELECT col2, Col1,
  reflect("java.util.UUID", "randomUUID") AS ID,
  Source,
 1 ,
SDate,
EDate
FROM Table ORC  JOIN Table2 _surr;

ERROR : Vertex failed, vertexName=Map 1, 
vertexId=vertex_1465411930667_0212_1_01, diagnostics=[Task failed, 
taskId=task_1465411930667_0212_1_01_00, diagnostics=[TaskAttempt 0 failed, 
info=[Error: Failure while running task:java.lang.RuntimeException: 
java.lang.RuntimeException: java.io.IOException: java.io.IOException: ORC does 
not support type conversion from INT to STRING.


I think issue is reflect("java.util.UUID", "randomUUID") AS ID

I know there is Bug raised while reading data from ORC table. Is there any 
workaround apart from reloading data.

-MS

Re: ORC does not support type conversion from INT to STRING.

2016-07-18 Thread Mahender Sarangam

Hi Mich,

Sorry for delay in responding. here is the scenario,

We have created new cluster  and we have moved all ORC File data into new 
cluster. We have re-created table pointing to ORC location. We have modified 
data type of ORC table from INT to String. From then onward, we were unable to 
fire select statement against this ORC table, hive keep throwing exception, 
"Orc table select. Unable to convert Int to String". Looks like it is bug in 
ORC table only. Where in we modify the datatype from int to string, is causing 
problem with ORC reading/select statement, it throws exceptio. Please let me 
know if there are any workaround for this scenario. Is this behavior expected 
previously also.


/Mahender





On 6/14/2016 11:47 AM, Mich Talebzadeh wrote:
you must excuse my ignorance

can you please elaborate on this as there seems something has gone wrong 
somewhere?


Dr Mich Talebzadeh



LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw



http://talebzadehmich.wordpress.com<http://talebzadehmich.wordpress.com/>



On 14 June 2016 at 19:42, Mahender Sarangam 
mailto:mahender.bigd...@outlook.com>> wrote:

Yes Mich. We have restored cluster from metastore.

On 6/14/2016 11:35 AM, Mich Talebzadeh wrote:
Hi Mahendar,


Did you load the meta-data DB/schema from backup and now seeing this error




Dr Mich Talebzadeh



LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw



http://talebzadehmich.wordpress.com



On 14 June 2016 at 19:04, Mahender Sarangam 
mailto:mahender.bigd...@outlook.com>> wrote:

ping.

On 6/13/2016 1:19 PM, Mahender Sarangam wrote:

Hi,

We are facing issue while reading data from ORC table. We have created ORC 
table and dumped data into it. We have deleted cluster due to some reason. When 
we recreated cluster (using Metastore) and table pointing to same location. 
When we perform reading from ORC table. We see below error.

SELECT col2, Col1,
  reflect("java.util.UUID", "randomUUID") AS ID,
  Source,
 1 ,
SDate,
EDate
FROM Table ORC  JOIN Table2 _surr;

ERROR : Vertex failed, vertexName=Map 1, 
vertexId=vertex_1465411930667_0212_1_01, diagnostics=[Task failed, 
taskId=task_1465411930667_0212_1_01_00, diagnostics=[TaskAttempt 0 failed, 
info=[Error: Failure while running task:java.lang.RuntimeException: 
java.lang.RuntimeException: java.io.IOException: java.io.IOException: ORC does 
not support type conversion from INT to STRING.


I think issue is reflect("java.util.UUID", "randomUUID") AS ID

I know there is Bug raised while reading data from ORC table. Is there any 
workaround apart from reloading data.

-MS

Re: Any way in hive to have functionality like SQL Server collation on Case sensitivity

2016-07-12 Thread Mahender Sarangam

Thanks Dudu,

I would like to know dealing with case in-sensitivity in other project. is 
every one converting to toLower() or toUpper() in the Joins ? . Is there any 
setting applied at Hive Server level which gets reflected in all the queries ?


/MS

On 5/25/2016 9:05 AM, Markovitz, Dudu wrote:
It will not be suitable for JOIN operation since it will cause a Cartesian 
product.
Any chosen solution should determine a single representation for any given 
string.

Dudu

From: Mich Talebzadeh [mailto:mich.talebza...@gmail.com]
Sent: Wednesday, May 25, 2016 1:31 AM
To: user 
Subject: Re: Any way in hive to have functionality like SQL Server collation on 
Case sensitivity

I would rather go for something like compare() 

 that allows one to directly compare two character strings based on alternate 
collation rules.

Hive does not have it. This is from SAP ASE

1> select compare ("aaa","bbb")
2> go
 ---
  -1
(1 row affected)
1> select compare ("aaa","Aaa")
2> go
 ---
   1
(1 row affected)

1> select compare ("aaa","AAA")
2> go
 ---
   1

•  The compare function returns the following values, based on the collation 
rules that you chose:

· 1 – indicates that char_expression1 or uchar_expression1 is greater 
than char_expression2 or uchar_expression2.

· 0 – indicates that char_expression1 or uchar_expression1 is equal to 
char_expression2 or uchar_expression2.

· -1 – indicates that char_expression1 or uchar_expression1 is less 
than char_expression2 or uchar expression2.

hive> select compare("aaa", "bbb");
FAILED: SemanticException [Error 10011]: Line 1:7 Invalid function 'compare'


HTH




Dr Mich Talebzadeh



LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw



http://talebzadehmich.wordpress.com



On 24 May 2016 at 21:15, mahender bigdata 
mailto:mahender.bigd...@outlook.com>> wrote:
Hi,

We would like to have feature in Hive where string comparison should ignore 
case sensitivity while joining on String Columns in hive. This feature helps us 
in reducing code of calling Upper or Lower function on Join columns. If it is 
already there, please let me know settings to enable this feature.

/MS

Is there any GROUP_CONCAT Function in Hive

2016-06-15 Thread Mahender Sarangam

Hi,

We have Hive table with 3 GB of data like 100 rows. We are looking for any 
functionality in hive, which can perform GROUP_CONCAT Function.

We tried implement Group_Concat function with use Collect_List and Collect_Set. 
But we are getting heap space error. Because, For each group key around 10 
rows are present,  now these rows which needs to be concatenate.

Any direct way to concat row data into single string column by GROUP BY.

Re: ORC does not support type conversion from INT to STRING.

2016-06-14 Thread Mahender Sarangam

Yes Mich. We have restored cluster from metastore.

On 6/14/2016 11:35 AM, Mich Talebzadeh wrote:
Hi Mahendar,


Did you load the meta-data DB/schema from backup and now seeing this error




Dr Mich Talebzadeh



LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw



<http://talebzadehmich.wordpress.com/>http://talebzadehmich.wordpress.com



On 14 June 2016 at 19:04, Mahender Sarangam 
mailto:mahender.bigd...@outlook.com>> wrote:

ping.

On 6/13/2016 1:19 PM, Mahender Sarangam wrote:

Hi,

We are facing issue while reading data from ORC table. We have created ORC 
table and dumped data into it. We have deleted cluster due to some reason. When 
we recreated cluster (using Metastore) and table pointing to same location. 
When we perform reading from ORC table. We see below error.

SELECT col2, Col1,
  reflect("java.util.UUID", "randomUUID") AS ID,
  Source,
 1 ,
SDate,
EDate
FROM Table ORC  JOIN Table2 _surr;

ERROR : Vertex failed, vertexName=Map 1, 
vertexId=vertex_1465411930667_0212_1_01, diagnostics=[Task failed, 
taskId=task_1465411930667_0212_1_01_00, diagnostics=[TaskAttempt 0 failed, 
info=[Error: Failure while running task:java.lang.RuntimeException: 
java.lang.RuntimeException: java.io.IOException: java.io.IOException: ORC does 
not support type conversion from INT to STRING.


I think issue is reflect("java.util.UUID", "randomUUID") AS ID

I know there is Bug raised while reading data from ORC table. Is there any 
workaround apart from reloading data.

-MS

Re: ORC does not support type conversion from INT to STRING.

2016-06-14 Thread Mahender Sarangam

ping.

On 6/13/2016 1:19 PM, Mahender Sarangam wrote:

Hi,

We are facing issue while reading data from ORC table. We have created ORC 
table and dumped data into it. We have deleted cluster due to some reason. When 
we recreated cluster (using Metastore) and table pointing to same location. 
When we perform reading from ORC table. We see below error.

SELECT col2, Col1,
  reflect("java.util.UUID", "randomUUID") AS ID,
  Source,
 1 ,
SDate,
EDate
FROM Table ORC  JOIN Table2 _surr;

ERROR : Vertex failed, vertexName=Map 1, 
vertexId=vertex_1465411930667_0212_1_01, diagnostics=[Task failed, 
taskId=task_1465411930667_0212_1_01_00, diagnostics=[TaskAttempt 0 failed, 
info=[Error: Failure while running task:java.lang.RuntimeException: 
java.lang.RuntimeException: java.io.IOException: java.io.IOException: ORC does 
not support type conversion from INT to STRING.


I think issue is reflect("java.util.UUID", "randomUUID") AS ID

I know there is Bug raised while reading data from ORC table. Is there any 
workaround apart from reloading data.

-MS

ORC does not support type conversion from INT to STRING.

2016-06-13 Thread Mahender Sarangam

Hi,

We are facing issue while reading data from ORC table. We have created ORC 
table and dumped data into it. We have deleted cluster due to some reason. When 
we recreated cluster (using Metastore) and table pointing to same location. 
When we perform reading from ORC table. We see below error.

SELECT col2, Col1,
  reflect("java.util.UUID", "randomUUID") AS ID,
  Source,
 1 ,
SDate,
EDate
FROM Table ORC  JOIN Table2 _surr;

ERROR : Vertex failed, vertexName=Map 1, 
vertexId=vertex_1465411930667_0212_1_01, diagnostics=[Task failed, 
taskId=task_1465411930667_0212_1_01_00, diagnostics=[TaskAttempt 0 failed, 
info=[Error: Failure while running task:java.lang.RuntimeException: 
java.lang.RuntimeException: java.io.IOException: java.io.IOException: ORC does 
not support type conversion from INT to STRING.


I think issue is reflect("java.util.UUID", "randomUUID") AS ID

I know there is Bug raised while reading data from ORC table. Is there any 
workaround apart from reloading data.

-MS

ORC does not support type conversion from INT to STRING.

2016-06-13 Thread Mahender Sarangam

Hi,

We are facing issue while reading data from ORC table. We have created ORC 
table and dumped data into it. We have deleted cluster due to some reason. When 
we recreated cluster (using Metastore) and table pointing to same location. 
When we perform reading from ORC table. We see below error.

SELECT col2, Col1,
  reflect("java.util.UUID", "randomUUID") AS ID,
  Source,
 1 ,
SDate,
EDate
FROM Table ORC  JOIN Table2 _surr;

ERROR : Vertex failed, vertexName=Map 1, 
vertexId=vertex_1465411930667_0212_1_01, diagnostics=[Task failed, 
taskId=task_1465411930667_0212_1_01_00, diagnostics=[TaskAttempt 0 failed, 
info=[Error: Failure while running task:java.lang.RuntimeException: 
java.lang.RuntimeException: java.io.IOException: java.io.IOException: ORC does 
not support type conversion from INT to STRING.

I know there is Bug raised while reading data from ORC table. Is there any 
workaround apart from reloading data.

-MS

Re: Get 100 items in Comma Separated strings from Hive Column.

2016-06-10 Thread Mahender Sarangam

Thanks Dudu. This is wonderful explaination. I'm very thankful

On 6/10/2016 7:24 AM, Markovitz, Dudu wrote:
regexp_extract ('(,?[^,]*){0,10}',0)

(...){0,10}

The expression surrounded by brackets repeats 0 to 10 times.


(,?[…]*)

Optional comma followed by sequence (0 or more) of characters


[^,]

Any character which is not comma


regexp_extract (...,0)

0 stands for the whole expression
1 stands for the 1st expression which is surrounded by brackets (ordered by the 
opening brackets)
2 stands for the 2nd expression which is surrounded by brackets (ordered by the 
opening brackets)
3 stands for the 3rd expression which is surrounded by brackets (ordered by the 
opening brackets)
Etc.



regexp_replace (((,?[^,]*){0,10}).*','$1')

Similar to regexp_extract but this time we’re not extracting the first 10 
tokens but replacing the whole expression with the first 10 tokens.
The expression that stands for the first 10 tokens is identical to the one we 
used in regexp_extract
.* stands for any character that repeats 0 or more times which represent 
anything following the first 10 tokens
$1 stands for the 1st expression which is surrounded by brackets (ordered by 
the opening brackets)


From: Mahender Sarangam [mailto:mahender.bigd...@outlook.com]
Sent: Friday, June 10, 2016 2:54 PM
To: user@hive.apache.org<mailto:user@hive.apache.org>
Subject: Re: Get 100 items in Comma Separated strings from Hive Column.


Thanks Dudu. I will check. Can you please throw some light on regexp_replace 
(((,?[^,]*){0,10}).*','$1')  regexp_extract ('(,?[^,]*){0,10}',0),

On 6/9/2016 11:33 PM, Markovitz, Dudu wrote:
+ Improvement

The “Count” can be done in a cleaner way
(The previous way works also with simple ‘replace’)

hive> select RowID,length(regexp_replace(stringColumn,'[^,]',''))+1 as count 
from t;

1  2
2  5
3  24
4  17
5  8
6  11
7  26
8  18
9  9


From: Markovitz, Dudu [mailto:dmarkov...@paypal.com]
Sent: Thursday, June 09, 2016 11:30 PM
To: user@hive.apache.org<mailto:user@hive.apache.org>
Subject: RE: Get 100 items in Comma Separated strings from Hive Column.


--  bash

mkdir t

cat>t/data.txt
1|44,85
2|56,37,83,68,43
3|33,48,42,18,23,80,31,86,48,42,37,52,99,55,93,1,63,67,32,75,44,57,70,2
4|77,26,95,53,11,99,74,82,7,55,75,6,32,87,75,99,80
5|48,78,39,62,16,44,43,63
6|35,97,99,19,22,50,29,84,82,25,77
7|80,43,82,94,81,58,70,8,70,6,62,100,60,84,55,24,100,75,84,15,53,5,19,45,61,73
8|66,44,66,4,80,72,81,63,51,24,51,77,87,85,10,36,43,2
9|39,64,29,14,9,42,66,56,33

hdfs dfs -put t /tmp


--  hive


create external table t
(
RowID   int
   ,stringColumnstring
)
row format delimited
fields terminated by '|'
location '/tmp/t'
;

select RowID,regexp_extract (stringColumn,'(,?[^,]*){0,10}',0) as 
string10,length(stringColumn)-length(regexp_replace(stringColumn,',',''))+1 as 
count from t;

144,85 2
256,37,83,68,43  5
333,48,42,18,23,80,31,86,48,42   24
477,26,95,53,11,99,74,82,7,5517
548,78,39,62,16,44,43,638
635,97,99,19,22,50,29,84,82,25   11
780,43,82,94,81,58,70,8,70,6 26
866,44,66,4,80,72,81,63,51,2418
939,64,29,14,9,42,66,56,33  9


Extracting the first 100 (10 in my example) tokens can be done with 
regexp_extract or regexp_replace

hive> select regexp_extract 
('1,2,3,4,5,6,7,8,9,10,11,12,13,14,15','(,?[^,]*){0,10}',0);

1,2,3,4,5,6,7,8,9,10

hive> select regexp_replace 
('1,2,3,4,5,6,7,8,9,10,11,12,13,14,15','((,?[^,]*){0,10}).*','$1');

1,2,3,4,5,6,7,8,9,10


From: Mahender Sarangam [mailto:mahender.bigd...@outlook.com]
Sent: Thursday, June 09, 2016 7:13 PM
To: user@hive.apache.org<mailto:user@hive.apache.org>
Subject: Get 100 items in Comma Separated strings from Hive Column.


Hi,

We have hive table which has a single column with more than 1000 comma 
separated string items.  Is there a way to retrieve only 100 string items from 
that Column. Also we need to capture number of comma separated string items. We 
are looking for more of   "substring_index" functionality, since we are using 
Hive 1.2 version, we couldn't find "substring_index" UDF function, Is there a 
way to achieve the same functionality with  "regexp_extract" and I also see 
there is UDF available not sure whether this helps us ach

Re: Get 100 items in Comma Separated strings from Hive Column.

2016-06-10 Thread Mahender Sarangam

Thanks Dudu. I will check. Can you please throw some light on regexp_replace 
(((,?[^,]*){0,10}).*','$1')  regexp_extract ('(,?[^,]*){0,10}',0),

On 6/9/2016 11:33 PM, Markovitz, Dudu wrote:
+ Improvement

The “Count” can be done in a cleaner way
(The previous way works also with simple ‘replace’)

hive> select RowID,length(regexp_replace(stringColumn,'[^,]',''))+1 as count 
from t;

1  2
2  5
3  24
4  17
5  8
6  11
7  26
8  18
9  9


From: Markovitz, Dudu [mailto:dmarkov...@paypal.com]
Sent: Thursday, June 09, 2016 11:30 PM
To: user@hive.apache.org<mailto:user@hive.apache.org>
Subject: RE: Get 100 items in Comma Separated strings from Hive Column.


--  bash

mkdir t

cat>t/data.txt
1|44,85
2|56,37,83,68,43
3|33,48,42,18,23,80,31,86,48,42,37,52,99,55,93,1,63,67,32,75,44,57,70,2
4|77,26,95,53,11,99,74,82,7,55,75,6,32,87,75,99,80
5|48,78,39,62,16,44,43,63
6|35,97,99,19,22,50,29,84,82,25,77
7|80,43,82,94,81,58,70,8,70,6,62,100,60,84,55,24,100,75,84,15,53,5,19,45,61,73
8|66,44,66,4,80,72,81,63,51,24,51,77,87,85,10,36,43,2
9|39,64,29,14,9,42,66,56,33

hdfs dfs -put t /tmp


--  hive


create external table t
(
RowID   int
   ,stringColumnstring
)
row format delimited
fields terminated by '|'
location '/tmp/t'
;

select RowID,regexp_extract (stringColumn,'(,?[^,]*){0,10}',0) as 
string10,length(stringColumn)-length(regexp_replace(stringColumn,',',''))+1 as 
count from t;

144,85 2
256,37,83,68,43  5
333,48,42,18,23,80,31,86,48,42   24
477,26,95,53,11,99,74,82,7,5517
548,78,39,62,16,44,43,638
635,97,99,19,22,50,29,84,82,25   11
780,43,82,94,81,58,70,8,70,6 26
866,44,66,4,80,72,81,63,51,2418
939,64,29,14,9,42,66,56,33  9


Extracting the first 100 (10 in my example) tokens can be done with 
regexp_extract or regexp_replace

hive> select regexp_extract 
('1,2,3,4,5,6,7,8,9,10,11,12,13,14,15','(,?[^,]*){0,10}',0);

1,2,3,4,5,6,7,8,9,10

hive> select regexp_replace 
('1,2,3,4,5,6,7,8,9,10,11,12,13,14,15','((,?[^,]*){0,10}).*','$1');

1,2,3,4,5,6,7,8,9,10


From: Mahender Sarangam [mailto:mahender.bigd...@outlook.com]
Sent: Thursday, June 09, 2016 7:13 PM
To: user@hive.apache.org<mailto:user@hive.apache.org>
Subject: Get 100 items in Comma Separated strings from Hive Column.


Hi,

We have hive table which has a single column with more than 1000 comma 
separated string items.  Is there a way to retrieve only 100 string items from 
that Column. Also we need to capture number of comma separated string items. We 
are looking for more of   "substring_index" functionality, since we are using 
Hive 1.2 version, we couldn't find "substring_index" UDF function, Is there a 
way to achieve the same functionality with  "regexp_extract" and I also see 
there is UDF available not sure whether this helps us achieving same 
functionality. 
https://github.com/brndnmtthws/facebook-hive-udfs/blob/master/src/main/java/com/facebook/hive/udf/UDFRegexpExtractAll.java

Scenario : Table1 (Source Table)

RowID stringColumn

1 1,2,3,4...1

2 2,4,5,8,4

3 10,11,98,100

Now i Would like to show table result structure like below

Row ID 100String count

1 1,2,3...100 1

2 2,4,5,8,4 5

Get 100 items in Comma Separated strings from Hive Column.

2016-06-09 Thread Mahender Sarangam

Hi,

We have hive table which has a single column with more than 1000 comma 
separated string items.  Is there a way to retrieve only 100 string items from 
that Column. Also we need to capture number of comma separated string items. We 
are looking for more of   "substring_index" functionality, since we are using 
Hive 1.2 version, we couldn't find "substring_index" UDF function, Is there a 
way to achieve the same functionality with  "regexp_extract" and I also see 
there is UDF available not sure whether this helps us achieving same 
functionality. 
https://github.com/brndnmtthws/facebook-hive-udfs/blob/master/src/main/java/com/facebook/hive/udf/UDFRegexpExtractAll.java

Scenario : Table1 (Source Table)

RowID stringColumn

1 1,2,3,4...1

2 2,4,5,8,4

3 10,11,98,100

Now i Would like to show table result structure like below

Row ID 100String count

1 1,2,3...100 1

2 2,4,5,8,4 5

Query Failing while querying on ORC Format

2016-05-14 Thread Mahender Sarangam

Hi,
We are dumping our data into ORC Partition Bucketed table. We have loaded 
almost 6 months data and here month is Partition by column. Now we have 
modified ORC partition bucketed table schema. We have added 2 more columns to 
the ORC table. Now whenever we are running select statement for older month 
which has no columns( even though these columns are not part in select clause, 
(projection column) ), it is throwing exception.
 
There is JIRA bug for this kind of requirement has already been raised.
https://issues.apache.org/jira/browse/HIVE-11981
 
Can any one please tell me know alternative workaround for reading old previous 
columns of ORC partition table.
 
Thanks

RE: Having Issue with Join Statement

2016-04-16 Thread Mahender Sarangam

Ping. Urgent help on this...
 
From: mahender.bigd...@outlook.com
To: user@hive.apache.org; yarn-iss...@hadoop.apache.org
Subject: Having Issue with Join Statement
Date: Sun, 17 Apr 2016 00:36:24 +




Hi,
 
I'm doing join of 2 large tables along join with couple of CTEs. I'm getting 
following error " Failure while running task:java.lang.RuntimeException: 
java.lang.RuntimeException: Hive Runtime Error while closing operators: 
java.lang.RuntimeException: java.io.IOException: Please check if you are 
invoking moveToNext() even after it returned false."
 
Has any one faced or came across with same issue. I see JIRA that it has been 
already fixed. We are using Hive 1.2 version. 
https://issues.apache.org/jira/browse/HIVE-11016
 
 
Is there any work around for this issue. or what reason for this error. Is 
there any Missing key.
 
NFO  : Map 1: 1/1 Map 11: 1/1 Map 12: 1/1 Map 13: 1/1 Map 16: 1/1 Map 17: 1/1 
Map 18: 26/26 Map 19: 1/1 Map 20: 1/1 Map 22: 145/145 Map 25: 4/4 Map 26: 4/4 
Map 27: 39/39 Map 28: 1/1 Map 29: 1/1 Map 30: 1/1 Map 31: 1/1 Map 32: 1/1 
Reducer 10: 0/1009 Reducer 14: 1/1 Reducer 15: 1/1 Reducer 2: 1/1 Reducer 21: 
270(+79,-3)/469 Reducer 23: 134/134 Reducer 24: 427/427 Reducer 3: 1/1 Reducer 
4: 0/1009 Reducer 5: 0/1009 Reducer 6: 0/1009 Reducer 7: 0/1009 Reducer 8: 
0/1009 Reducer 9: 0/1009 
INFO  : Map 1: 1/1 Map 11: 1/1 Map 12: 1/1 Map 13: 1/1 Map 16: 1/1 Map 17: 1/1 
Map 18: 26/26 Map 19: 1/1 Map 20: 1/1 Map 22: 145/145 Map 25: 4/4 Map 26: 4/4 
Map 27: 39/39 Map 28: 1/1 Map 29: 1/1 Map 30: 1/1 Map 31: 1/1 Map 32: 1/1 
Reducer 10: 0/1009 Reducer 14: 1/1 Reducer 15: 1/1 Reducer 2: 1/1 Reducer 21: 
289(+82,-3)/469 Reducer 23: 134/134 Reducer 24: 427/427 Reducer 3: 1/1 Reducer 
4: 0/1009 Reducer 5: 0/1009 Reducer 6: 0/1009 Reducer 7: 0/1009 Reducer 8: 
0/1009 Reducer 9: 0/1009 
INFO  : Map 1: 1/1 Map 11: 1/1 Map 12: 1/1 Map 13: 1/1 Map 16: 1/1 Map 17: 1/1 
Map 18: 26/26 Map 19: 1/1 Map 20: 1/1 Map 22: 145/145 Map 25: 4/4 Map 26: 4/4 
Map 27: 39/39 Map 28: 1/1 Map 29: 1/1 Map 30: 1/1 Map 31: 1/1 Map 32: 1/1 
Reducer 10: 0/1009 Reducer 14: 1/1 Reducer 15: 1/1 Reducer 2: 1/1 Reducer 21: 
309(+84,-3)/469 Reducer 23: 134/134 Reducer 24: 427/427 Reducer 3: 1/1 Reducer 
4: 0/1009 Reducer 5: 0/1009 Reducer 6: 0/1009 Reducer 7: 0/1009 Reducer 8: 
0/1009 Reducer 9: 0/1009 
INFO  : Map 1: 1/1 Map 11: 1/1 Map 12: 1/1 Map 13: 1/1 Map 16: 1/1 Map 17: 1/1 
Map 18: 26/26 Map 19: 1/1 Map 20: 1/1 Map 22: 145/145 Map 25: 4/4 Map 26: 4/4 
Map 27: 39/39 Map 28: 1/1 Map 29: 1/1 Map 30: 1/1 Map 31: 1/1 Map 32: 1/1 
Reducer 10: 0/1009 Reducer 14: 1/1 Reducer 15: 1/1 Reducer 2: 1/1 Reducer 21: 
312(+0,-4)/469 Reducer 23: 134/134 Reducer 24: 427/427 Reducer 3: 1/1 Reducer 
4: 0/1009 Reducer 5: 0/1009 Reducer 6: 0/1009 Reducer 7: 0/1009 Reducer 8: 
0/1009 Reducer 9: 0/1009 
ERROR : Status: Failed
ERROR : Vertex failed, vertexName=Reducer 21, 
vertexId=vertex_1460713464534_0363_1_09, diagnostics=[Task failed, 
taskId=task_1460713464534_0363_1_09_000232, diagnostics=[TaskAttempt 0 failed, 
info=[Error: Failure while running task:java.lang.RuntimeException: 
java.lang.RuntimeException: Hive Runtime Error while closing operators: 
java.lang.RuntimeException: java.io.IOException: Please check if you are 
invoking moveToNext() even after it returned false.
 at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
 at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
 at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
 at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
 at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
 at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
 at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
 at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: Hive Runtime Error while closing 
operators: java.lang.RuntimeException: java.io.IOException: Please check if you 
are invoking moveToNext() even after it returned false.
 at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.close(ReduceRecordProcessor.java:310)
 at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:162)
 ... 14 more
Caused by: org.apache.hadoop.h

Having Issue with Join Statement

2016-04-16 Thread Mahender Sarangam

Hi,
 
I'm doing join of 2 large tables along join with couple of CTEs. I'm getting 
following error " Failure while running task:java.lang.RuntimeException: 
java.lang.RuntimeException: Hive Runtime Error while closing operators: 
java.lang.RuntimeException: java.io.IOException: Please check if you are 
invoking moveToNext() even after it returned false."
 
Has any one faced or came across with same issue. I see JIRA that it has been 
already fixed. We are using Hive 1.2 version. 
https://issues.apache.org/jira/browse/HIVE-11016
 
 
Is there any work around for this issue. or what reason for this error. Is 
there any Missing key.
 
NFO  : Map 1: 1/1 Map 11: 1/1 Map 12: 1/1 Map 13: 1/1 Map 16: 1/1 Map 17: 1/1 
Map 18: 26/26 Map 19: 1/1 Map 20: 1/1 Map 22: 145/145 Map 25: 4/4 Map 26: 4/4 
Map 27: 39/39 Map 28: 1/1 Map 29: 1/1 Map 30: 1/1 Map 31: 1/1 Map 32: 1/1 
Reducer 10: 0/1009 Reducer 14: 1/1 Reducer 15: 1/1 Reducer 2: 1/1 Reducer 21: 
270(+79,-3)/469 Reducer 23: 134/134 Reducer 24: 427/427 Reducer 3: 1/1 Reducer 
4: 0/1009 Reducer 5: 0/1009 Reducer 6: 0/1009 Reducer 7: 0/1009 Reducer 8: 
0/1009 Reducer 9: 0/1009 
INFO  : Map 1: 1/1 Map 11: 1/1 Map 12: 1/1 Map 13: 1/1 Map 16: 1/1 Map 17: 1/1 
Map 18: 26/26 Map 19: 1/1 Map 20: 1/1 Map 22: 145/145 Map 25: 4/4 Map 26: 4/4 
Map 27: 39/39 Map 28: 1/1 Map 29: 1/1 Map 30: 1/1 Map 31: 1/1 Map 32: 1/1 
Reducer 10: 0/1009 Reducer 14: 1/1 Reducer 15: 1/1 Reducer 2: 1/1 Reducer 21: 
289(+82,-3)/469 Reducer 23: 134/134 Reducer 24: 427/427 Reducer 3: 1/1 Reducer 
4: 0/1009 Reducer 5: 0/1009 Reducer 6: 0/1009 Reducer 7: 0/1009 Reducer 8: 
0/1009 Reducer 9: 0/1009 
INFO  : Map 1: 1/1 Map 11: 1/1 Map 12: 1/1 Map 13: 1/1 Map 16: 1/1 Map 17: 1/1 
Map 18: 26/26 Map 19: 1/1 Map 20: 1/1 Map 22: 145/145 Map 25: 4/4 Map 26: 4/4 
Map 27: 39/39 Map 28: 1/1 Map 29: 1/1 Map 30: 1/1 Map 31: 1/1 Map 32: 1/1 
Reducer 10: 0/1009 Reducer 14: 1/1 Reducer 15: 1/1 Reducer 2: 1/1 Reducer 21: 
309(+84,-3)/469 Reducer 23: 134/134 Reducer 24: 427/427 Reducer 3: 1/1 Reducer 
4: 0/1009 Reducer 5: 0/1009 Reducer 6: 0/1009 Reducer 7: 0/1009 Reducer 8: 
0/1009 Reducer 9: 0/1009 
INFO  : Map 1: 1/1 Map 11: 1/1 Map 12: 1/1 Map 13: 1/1 Map 16: 1/1 Map 17: 1/1 
Map 18: 26/26 Map 19: 1/1 Map 20: 1/1 Map 22: 145/145 Map 25: 4/4 Map 26: 4/4 
Map 27: 39/39 Map 28: 1/1 Map 29: 1/1 Map 30: 1/1 Map 31: 1/1 Map 32: 1/1 
Reducer 10: 0/1009 Reducer 14: 1/1 Reducer 15: 1/1 Reducer 2: 1/1 Reducer 21: 
312(+0,-4)/469 Reducer 23: 134/134 Reducer 24: 427/427 Reducer 3: 1/1 Reducer 
4: 0/1009 Reducer 5: 0/1009 Reducer 6: 0/1009 Reducer 7: 0/1009 Reducer 8: 
0/1009 Reducer 9: 0/1009 
ERROR : Status: Failed
ERROR : Vertex failed, vertexName=Reducer 21, 
vertexId=vertex_1460713464534_0363_1_09, diagnostics=[Task failed, 
taskId=task_1460713464534_0363_1_09_000232, diagnostics=[TaskAttempt 0 failed, 
info=[Error: Failure while running task:java.lang.RuntimeException: 
java.lang.RuntimeException: Hive Runtime Error while closing operators: 
java.lang.RuntimeException: java.io.IOException: Please check if you are 
invoking moveToNext() even after it returned false.
 at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
 at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
 at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
 at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
 at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
 at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
 at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
 at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: Hive Runtime Error while closing 
operators: java.lang.RuntimeException: java.io.IOException: Please check if you 
are invoking moveToNext() even after it returned false.
 at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.close(ReduceRecordProcessor.java:310)
 at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:162)
 ... 14 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.RuntimeException: java.io.IOException: Please check if you are 
invoking moveToNext() even after it returned false.
 at 
org.apache.hadoop.hive.ql.exec.CommonMergeJ

RE: Hadoop 2.6 version https://issues.apache.org/jira/browse/YARN-2624

2016-03-25 Thread Mahender Sarangam

any update on this..
 
> Subject: Re: Hadoop 2.6 version 
> https://issues.apache.org/jira/browse/YARN-2624
> To: user@hive.apache.org
> From: mahender.bigd...@outlook.com
> Date: Thu, 24 Mar 2016 12:20:57 -0700
> 
> 
> Is there any other way to do NM Node Cache directory, I'm using Windows 
> Cluster Hortan Works HDP System.
> 
> /mahender
> On 3/24/2016 11:27 AM, mahender bigdata wrote:
> > Hi,
> >
> > Has any one is holding work around for this bug, Looks like this 
> > problem still persists in hadoop 2.6. Templeton Job get failed as soon 
> > as job is submitted. Please let us know as early as possible
> >
> > Application application_1458842675930_0002 failed 2 times due to AM 
> > Container for appattempt_1458842675930_0002_02 exited with 
> > exitCode: -1000
> > For more detailed output, check application tracking 
> > page:http://headnodehost:9014/cluster/app/application_1458842675930_0002Then,
> >  
> > click on links to logs of each attempt.
> > Diagnostics: Rename cannot overwrite non empty destination directory 
> > c:/apps/temp/hdfs/nm-local-dir/usercache/XXX/filecache/18
> > java.io.IOException: Rename cannot overwrite non empty destination 
> > directory c:/apps/temp/hdfs/nm-local-dir/usercache//filecache/18
> > at 
> > org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:735)
> >  
> >
> > at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:236)
> > at 
> > org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:678)
> > at org.apache.hadoop.fs.FileContext.rename(FileContext.java:958)
> > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366)
> > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62)
> > at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> > at 
> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> > at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> > at 
> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >  
> >
> > at 
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >  
> >
> > at java.lang.Thread.run(Thread.java:745)
> > Failing this attempt. Failing the application.
> >
> >
> > /Mahender
> >
> >
> >
> >
>

Implementing PIVOT in Hive

2016-03-19 Thread Mahender Sarangam

Hi Team,
 
We are looking for Pivoting of some of columns has rows. Is there any support 
for Pivoting in HIVE

RE: having problem while querying out select statement in TEZ

2016-03-01 Thread Mahender Sarangam

Any update ?

To: user@hive.apache.org
From: mahender.bigd...@outlook.com
Subject: having problem while querying out select statement in TEZ
Date: Tue, 1 Mar 2016 12:55:20 -0800

Hi,

We have created ORC partition Bucketed Table in Hive with ~ has
delimiter.  Whenever i firing select statement on
ORCPartitionBucketing Table, I keep getting error 
org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector
cannot be cast to
org.apache.hadoop.hive.ql.exec.vector.LongColumnVector

select ID,..columnname..from hiveOrcpb table. I see this
issue is reproducing
https://issues.apache.org/jira/browse/HIVE-6349

data got inserted properly
previously using Hive 0.13 version , when we read data in
hive 1.2 version, we see a issue. Has  any one faced same
issue. please let  me know reason for this issue. Couldn't
figure out root cause of this error.

Vertex failed, vertexName=Map 1,
vertexId=vertex_1456489763556_0037_1_00, diagnostics=[Task
failed,
taskId=task_1456489763556_0037_1_00_00,
diagnostics=[TaskAttempt 0 failed,
info=[Error: Failure while running
task:java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException:
java.io.IOException:
java.lang.RuntimeException: java.lang.ClassCastException:
org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector cannot
be cast to
org.apache.hadoop.hive.ql.exec.vector.LongColumnVector
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
at
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
at
java.security.AccessController.doPrivileged(Native Method)
at
javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
at
org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at
java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at
java.lang.Thread.run(Thread.java:745)
Caused by:
org.apache.hadoop.hive.ql.metadata.HiveException:
java.io.IOException:
java.lang.RuntimeException: java.lang.ClassCastException:
org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector cannot
be cast to
org.apache.hadoop.hive.ql.exec.vector.LongColumnVector
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:71)
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:310)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:148)
... 14 more
Caused by: java.io.IOException:
java.lang.RuntimeException: java.lang.ClassCastException:
org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector cannot
be cast to
org.apache.hadoop.hive.ql.exec.vector.LongColumnVector
at
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
at
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
at
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:355)
at
org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79)
at
org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33)
at
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116)
at
org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:141)
at
org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:113)
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRec

Any Built-in ISNumeric() UDF in Hive

2015-12-19 Thread Mahender Sarangam

Hi,
 
I Would like to know whether there are any UDF function like ISNumeric() to 
check data is numeric or not. I can make use of CAST to convert data, but I 
would like to know if there any UDF available ?
 
 
Can any one share with me samples, If I create UDF in PIG, how can I make use 
of it in Hive Script.
 
 
Thanks.

RE: Hive Support for Unicode languages

2015-12-05 Thread Mahender Sarangam

Its Windows Server 2012 OS.
 
> From: jornfra...@gmail.com
> Subject: Re: Hive Support for Unicode languages
> Date: Fri, 4 Dec 2015 13:19:00 +0100
> To: user@hive.apache.org
> 
> What operating system are you using?
> 
> > On 04 Dec 2015, at 01:25, mahender bigdata  
> > wrote:
> > 
> > Hi Team,
> > 
> > Does hive supports Hive Unicode like UTF-8,UTF-16 and UTF-32. I would like 
> > to see different language supported in hive table. Is there any serde which 
> > can show exactly japanese, chineses character rather than showing symbols 
> > on Hive console.
> > 
> > -Mahender

RE: Building Rule Engine/ Rule Transformation

2015-11-29 Thread Mahender Sarangam

We are not expert in java programing, we work .NET code. But there is no 
support for .NET UDF. 

Subject: Re: Building Rule Engine/ Rule Transformation
From: jornfra...@gmail.com
Date: Sun, 29 Nov 2015 11:33:05 +0100
CC: user-h...@hive.apache.org
To: user@hive.apache.org

Why not implement Hive UDF in Java?
On 28 Nov 2015, at 21:26, Mahender Sarangam  
wrote:

 Hi team,

We need expert input to discuss how to implement Rule engine in hive. Do you 
have any references available to implement rule in hive/pig.

We are migrating our Stored Procedures into Multiple Hive query but it is 
becoming complex in maintenance,  Hive is not Procedural Language, so we could 
not write IF ELSE logic or any kind of procedural language. Can any one suggest 
us which HDP technology can be helpful for Procedural language replacement, we 
are thinking of PIG, Can it  becomes best example to perform rule/data 
transformation.

Our data is in Structure format table with around 250 columns, We are having 
Rules like Update Columns based on lookup table values, delete rows which 
doesn't satisfy the condition, update statement include update of multiple 
columns like case etc and some date conversions columns. Suggest us best way to 
implement this rule engine.  previously we have used our SQL Server for Rule 
engine, now we have migrated application to Big Data, we are looking for any 
references available to perform this rule transformation.

Some of our Finding are 

Make use of Hive StreamWriting PIG.

We are .NET developers, Can we think of writing .EXE application and stream row 
wise data to .EXE and apply rules on top of  each row. Will it be nicer 
solution or Is it better to implement in PIG, but implementing doesn't fetch me 
much benefit when compared with Hive. Can you please comment on above 
approaches please.

Thanks,
Mahender

Building Rule Engine/ Rule Transformation

2015-11-28 Thread Mahender Sarangam

 Hi team,
 
We need expert input to discuss how to implement Rule engine in hive. Do you 
have any references available to implement rule in hive/pig.

 
We are migrating our Stored Procedures into Multiple Hive query but it is 
becoming complex in maintenance,  Hive is not Procedural Language, so we could 
not write IF ELSE logic or any kind of procedural language. Can any one suggest 
us which HDP technology can be helpful for Procedural language replacement, we 
are thinking of PIG, Can it  becomes best example to perform rule/data 
transformation.
 
Our data is in Structure format table with around 250 columns, We are having 
Rules like Update Columns based on lookup table values, delete rows which 
doesn't satisfy the condition, update statement include update of multiple 
columns like case etc and some date conversions columns. Suggest us best way to 
implement this rule engine.  previously we have used our SQL Server for Rule 
engine, now we have migrated application to Big Data, we are looking for any 
references available to perform this rule transformation.
 
 
Some of our Finding are 

Make use of Hive StreamWriting PIG.
 
We are .NET developers, Can we think of writing .EXE application and stream row 
wise data to .EXE and apply rules on top of  each row. Will it be nicer 
solution or Is it better to implement in PIG, but implementing doesn't fetch me 
much benefit when compared with Hive. Can you please comment on above 
approaches please.
 
Thanks,
Mahender

66 matches

Mail list logo