Re: stop hive from generating job file for every query

2014-07-30 Thread Navis류승우
Set value of "hive.querylog.location" to empty string in hive-site.xml.

Thanks,
Navis


2014-07-31 13:08 GMT+09:00 Gitansh Chadha :

> Hi,
>
> I want to stop hive commands from generating the hive job file (under
> /tmp/user/hive_log_job*) for every query, as we run multiple queries in
> batch and the file is getting really big. (1GB+)
>
> what would be the best way to do it?
>
> Thanks in advance,
> g
>


stop hive from generating job file for every query

2014-07-30 Thread Gitansh Chadha
Hi,

I want to stop hive commands from generating the hive job file (under
/tmp/user/hive_log_job*) for every query, as we run multiple queries in
batch and the file is getting really big. (1GB+)

what would be the best way to do it?

Thanks in advance,
g


stop hive from generating job file for every query

2014-07-30 Thread Gitansh Chadha
Hi,

I want to stop hive commands from generating the hive job file (under
/tmp/user/hive_log_job*) for every query, as we run multiple queries in
batch and the file is getting really big. (1GB+)

what would be the best way to do it?

Thanks in advance,
g


stop hive from generating job file for every query

2014-07-30 Thread Gitansh Chadha
Hi,

I want to stop hive commands from generating the hive job file (under
/tmp/user/hive_log_job*) for every query, as we run multiple queries in
batch and the file is getting really big. (1GB+)

what would be the best way to do it?

Thanks in advance,
g


stop hive shell from creating hive job files

2014-07-30 Thread Gitansh Chadha
Hi,

I want to stop hive commands from generating the hive job file (under
/tmp/user/hive_log_job*) for every query, as we run multiple queries in
batch and the file is getting really big. (1GB+)

what would be the best way to do it?

Thanks in advance,
g


Re: hive auto join conversion

2014-07-30 Thread Eugene Koifman
would manually rewriting the query from (T1 union all T2) LOJ S to
equivalent (T1 LOJ S) union all (T2 LOJ S) help work around this issue?


On Wed, Jul 30, 2014 at 6:19 PM, Chen Song  wrote:

> I tried that and I got the following error.
>
> FAILED: SemanticException [Error 10227]: Not all clauses are supported
> with mapjoin hint. Please remove mapjoin hint.
>
> I then tried turning off auto join conversion.
>
> set hive.auto.convert.join=false
>
> But no luck, same error.
>
> Looks like it is a known issue,
>
>
> http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.0.2/bk_releasenotes_hdp_2.0/content/ch_relnotes-hdp2.0.0.2-5-2.html
>
> Chen
>
>
>
>
> On Wed, Jul 30, 2014 at 9:10 PM, Navis류승우  wrote:
>
>> Could you do it with hive.ignore.mapjoin.hint=false? Mapjoin hint is
>> ignored from hive-0.11.0 by default (see
>> https://issues.apache.org/jira/browse/HIVE-4042)
>>
>> Thanks,
>> Navis
>>
>>
>> 2014-07-31 10:04 GMT+09:00 Chen Song :
>>
>> I am using cdh5 with hive 0.12. We have some hive jobs migrated from hive
>>> 0.10 and they are written like below:
>>>
>>> select /*+ MAPJOIN(sup) */ c1, c2, sup.c
>>> from
>>> (
>>> select key, c1, c2 from table1
>>> union all
>>> select key, c1, c2 from table2
>>> ) table
>>> left outer join
>>> sup
>>> on (table.c1 = sup.key)
>>> distribute by c1
>>>
>>> In Hive 0.10 (CDH4), Hive translates the left outer join into a map join
>>> (map only job), followed by a regular MR job for distribute by.
>>>
>>> In Hive 0.12 (CDH5), Hive is not able to convert the join into a map
>>> join. Instead it launches a common map reduce for the join, followed by
>>> another mr for distribute by. However, when I take out the union all
>>> operator, Hive seems to be able to create a single MR job, with map join on
>>> map phase, and reduce for distribute by.
>>>
>>> I read a bit on
>>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins
>>> and found out that there are some restrictions on map side join starting
>>> Hive 0.11. The following are not supported.
>>>
>>>
>>>- Union Followed by a MapJoin
>>>- Lateral View Followed by a MapJoin
>>>- Reduce Sink (Group By/Join/Sort By/Cluster By/Distribute By)
>>>Followed by MapJoin
>>>- MapJoin Followed by Union
>>>- MapJoin Followed by Join
>>>- MapJoin Followed by MapJoin
>>>
>>>
>>> So if one side of the table (big side) is a union of some tables and the
>>> other side is a small table, Hive would not be able to do a map join at
>>> all? Is that correct?
>>>
>>> If correct, what should I do to make the job backward compatible?
>>>
>>> --
>>> Chen Song
>>>
>>>
>>
>
>
> --
> Chen Song
>
>


-- 

Thanks,
Eugene

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: hive auto join conversion

2014-07-30 Thread Chen Song
I tried that and I got the following error.

FAILED: SemanticException [Error 10227]: Not all clauses are supported with
mapjoin hint. Please remove mapjoin hint.

I then tried turning off auto join conversion.

set hive.auto.convert.join=false

But no luck, same error.

Looks like it is a known issue,

http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.0.2/bk_releasenotes_hdp_2.0/content/ch_relnotes-hdp2.0.0.2-5-2.html

Chen




On Wed, Jul 30, 2014 at 9:10 PM, Navis류승우  wrote:

> Could you do it with hive.ignore.mapjoin.hint=false? Mapjoin hint is
> ignored from hive-0.11.0 by default (see
> https://issues.apache.org/jira/browse/HIVE-4042)
>
> Thanks,
> Navis
>
>
> 2014-07-31 10:04 GMT+09:00 Chen Song :
>
> I am using cdh5 with hive 0.12. We have some hive jobs migrated from hive
>> 0.10 and they are written like below:
>>
>> select /*+ MAPJOIN(sup) */ c1, c2, sup.c
>> from
>> (
>> select key, c1, c2 from table1
>> union all
>> select key, c1, c2 from table2
>> ) table
>> left outer join
>> sup
>> on (table.c1 = sup.key)
>> distribute by c1
>>
>> In Hive 0.10 (CDH4), Hive translates the left outer join into a map join
>> (map only job), followed by a regular MR job for distribute by.
>>
>> In Hive 0.12 (CDH5), Hive is not able to convert the join into a map
>> join. Instead it launches a common map reduce for the join, followed by
>> another mr for distribute by. However, when I take out the union all
>> operator, Hive seems to be able to create a single MR job, with map join on
>> map phase, and reduce for distribute by.
>>
>> I read a bit on
>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins
>> and found out that there are some restrictions on map side join starting
>> Hive 0.11. The following are not supported.
>>
>>
>>- Union Followed by a MapJoin
>>- Lateral View Followed by a MapJoin
>>- Reduce Sink (Group By/Join/Sort By/Cluster By/Distribute By)
>>Followed by MapJoin
>>- MapJoin Followed by Union
>>- MapJoin Followed by Join
>>- MapJoin Followed by MapJoin
>>
>>
>> So if one side of the table (big side) is a union of some tables and the
>> other side is a small table, Hive would not be able to do a map join at
>> all? Is that correct?
>>
>> If correct, what should I do to make the job backward compatible?
>>
>> --
>> Chen Song
>>
>>
>


-- 
Chen Song


Re: hive auto join conversion

2014-07-30 Thread Navis류승우
Could you do it with hive.ignore.mapjoin.hint=false? Mapjoin hint is
ignored from hive-0.11.0 by default (see
https://issues.apache.org/jira/browse/HIVE-4042)

Thanks,
Navis


2014-07-31 10:04 GMT+09:00 Chen Song :

> I am using cdh5 with hive 0.12. We have some hive jobs migrated from hive
> 0.10 and they are written like below:
>
> select /*+ MAPJOIN(sup) */ c1, c2, sup.c
> from
> (
> select key, c1, c2 from table1
> union all
> select key, c1, c2 from table2
> ) table
> left outer join
> sup
> on (table.c1 = sup.key)
> distribute by c1
>
> In Hive 0.10 (CDH4), Hive translates the left outer join into a map join
> (map only job), followed by a regular MR job for distribute by.
>
> In Hive 0.12 (CDH5), Hive is not able to convert the join into a map join.
> Instead it launches a common map reduce for the join, followed by another
> mr for distribute by. However, when I take out the union all operator, Hive
> seems to be able to create a single MR job, with map join on map phase, and
> reduce for distribute by.
>
> I read a bit on
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins
> and found out that there are some restrictions on map side join starting
> Hive 0.11. The following are not supported.
>
>
>- Union Followed by a MapJoin
>- Lateral View Followed by a MapJoin
>- Reduce Sink (Group By/Join/Sort By/Cluster By/Distribute By)
>Followed by MapJoin
>- MapJoin Followed by Union
>- MapJoin Followed by Join
>- MapJoin Followed by MapJoin
>
>
> So if one side of the table (big side) is a union of some tables and the
> other side is a small table, Hive would not be able to do a map join at
> all? Is that correct?
>
> If correct, what should I do to make the job backward compatible?
>
> --
> Chen Song
>
>


hive auto join conversion

2014-07-30 Thread Chen Song
I am using cdh5 with hive 0.12. We have some hive jobs migrated from hive
0.10 and they are written like below:

select /*+ MAPJOIN(sup) */ c1, c2, sup.c
from
(
select key, c1, c2 from table1
union all
select key, c1, c2 from table2
) table
left outer join
sup
on (table.c1 = sup.key)
distribute by c1

In Hive 0.10 (CDH4), Hive translates the left outer join into a map join
(map only job), followed by a regular MR job for distribute by.

In Hive 0.12 (CDH5), Hive is not able to convert the join into a map join.
Instead it launches a common map reduce for the join, followed by another
mr for distribute by. However, when I take out the union all operator, Hive
seems to be able to create a single MR job, with map join on map phase, and
reduce for distribute by.

I read a bit on
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins
and found out that there are some restrictions on map side join starting
Hive 0.11. The following are not supported.


   - Union Followed by a MapJoin
   - Lateral View Followed by a MapJoin
   - Reduce Sink (Group By/Join/Sort By/Cluster By/Distribute By) Followed
   by MapJoin
   - MapJoin Followed by Union
   - MapJoin Followed by Join
   - MapJoin Followed by MapJoin


So if one side of the table (big side) is a union of some tables and the
other side is a small table, Hive would not be able to do a map join at
all? Is that correct?

If correct, what should I do to make the job backward compatible?

-- 
Chen Song


change column type of orc table will throw exception in query time

2014-07-30 Thread wzc
hi,
 Currently, if we change orc format hive table using "alter table orc_table
change c1 c1 bigint ", it will throw exception  from SerDe
("org.apache.hadoop.io.IntWritable
cannot be cast to org.apache.hadoop.io.LongWritable" ) in query time, this
is different behavior from hive (using other file format), where it will
try to perform cast (null value in case of incompatible type).
  I find HIVE-6784 
happen to be the same issue with parquet while it says that currently it
works with partitioned table:

> The exception raised from changing type actually only happens to
>> non-partitioned tables. For partitioned tables, if there is type change in
>> table level, there will be an ObjectInspectorConverter (in parquet's case —
>> StructConverter) to convert type between partition and table. For
>> non-partitioned tables, the ObjectInspectorConverter is always
>> IdentityConverter, which passes the deserialized object as it is, causing
>> type mismatch between object and ObjectInspector.
>
>
>
  According to my test with hive branch-0.13, it still fail with orc
partitioned table.I think this behavior is unexpected and I'm digging into
the code to find a way to fix it now. Any help is appreciated.





I use the following script to test it with partitioned table on branch-0.13:

use test;
> DROP TABLE if exists orc_change_type_staging;
> DROP TABLE if exists orc_change_type;
> CREATE TABLE orc_change_type_staging (
> id int
> );
> CREATE TABLE orc_change_type (
> id int
> ) PARTITIONED BY (`dt` string)
> stored as orc;
> --- load staging table
> LOAD DATA LOCAL INPATH '../hive/examples/files/int.txt' OVERWRITE INTO
> TABLE orc_change_type_staging;
> --- populate orc hive table
> INSERT OVERWRITE TABLE orc_change_type partition(dt='20140718') select *
> FROM orc_change_type_staging;
> --- change column id from int to bigint
> ALTER TABLE orc_change_type CHANGE id id bigint;
> INSERT OVERWRITE TABLE orc_change_type partition(dt='20140719') select *
> FROM orc_change_type_staging;
> SELECT id FROM orc_change_type where dt between '20140718' and '20140719';


and it throw exception with branch-0.13:

> Error: java.io.IOException: java.io.IOException:
> java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be
> cast to org.apache.hadoop.io.LongWritable
> at
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
> at
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
> at
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:256)
> at
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:171)
> at
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:197)
> at
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:183)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
> Caused by: java.io.IOException: java.lang.ClassCastException:
> org.apache.hadoop.io.IntWritable cannot be cast to
> org.apache.hadoop.io.LongWritable
> at
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
> at
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
> at
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:344)
> at
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:101)
> at
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:41)
> at
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:122)
> at
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:254)
> ... 11 more
> Caused by: java.lang.ClassCastException: org.apache.hadoop.io.IntWritable
> cannot be cast to org.apache.hadoop.io.LongWritable
> at
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$LongTreeReader.next(RecordReaderImpl.java:717)
> at
> org.apache.hadoop.hiv

Re: Why does SMB join generate hash table locally, even if input tables are large?

2014-07-30 Thread Pala M Muthaia
+hive-users


On Tue, Jul 29, 2014 at 1:56 PM, Pala M Muthaia  wrote:

> Hi,
>
> I am testing SMB join for 2 large tables. The tables are bucketed and
> sorted on the join column. I notice that even though the table is large,
> Hive attempts to generate hash table for the 'small' table locally,
>  similar to map join. Since the table is large in my case, the client runs
> out of memory and the query fails.
>
> I am using Hive 0.12 with the following settings:
>
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.input.format =
> org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
>
> My test query does a simple join and a select, no subqueries/nested
> queries etc.
>
> I understand why a (bucket) map join requires hash table generation, but
> why is that included for an SMB join? Shouldn't a SMB join just spin up one
> mapper for each bucket and perform a sort merge join directly on the mapper?
>
>
> Thanks,
> pala
>
>
>
>


case statement in SELECT TRANSFORM

2014-07-30 Thread Kevin Weiler
Is it possible to have CASE or SUM statements inside of a TRANSFORM selection? 
When I do it now, I get the following error:

FAILED: ParseException line 41:10 mismatched input 'AS' expecting ) near 'END' 
in transform clause

--
Kevin Weiler
IT
IMC Financial Markets | 233 S. Wacker Drive, Suite 4300 | Chicago, IL 60606 | 
http://imc-chicago.com/
Phone: +1 312-204-7439 | Fax: +1 312-244-3301 | E-Mail: 
kevin.wei...@imc-chicago.com




The information in this e-mail is intended only for the person or entity to 
which it is addressed.

It may contain confidential and /or privileged material. If someone other than 
the intended recipient should receive this e-mail, he / she shall not be 
entitled to read, disseminate, disclose or duplicate it.

If you receive this e-mail unintentionally, please inform us immediately by 
"reply" and then delete it from your system. Although this information has been 
compiled with great care, neither IMC Financial Markets & Asset Management nor 
any of its related entities shall accept any responsibility for any errors, 
omissions or other inaccuracies in this information or for the consequences 
thereof, nor shall it be bound in any way by the contents of this e-mail or its 
attachments. In the event of incomplete or incorrect transmission, please 
return the e-mail to the sender and permanently delete this message and any 
attachments.

Messages and attachments are scanned for all known viruses. Always scan 
attachments before opening them.


Re: hive udf cannot recognize generic method

2014-07-30 Thread Jason Dere
Sounds like you are using the older style UDF class.  In that case, yes you 
would have to override evaluate() for each type of input.
You could also try overriding the GenericUDF class - that would allow you to do 
a single method, though it may be a bit more complicated (can look at the Hive 
code for some examples)


On Jul 30, 2014, at 7:43 AM, Dan Fan  wrote:

> Hi there 
> 
> I am writing a hive UDF function. The input could be string, int, double etc.
> The return is based on the data type. I was trying to use the generic method, 
> however, hive seems not recognize it. 
> Here is the piece of code I have as example.
> 
>   public  T evaluate(final T s, final String column_name, final int 
> bitmap) throws Exception {
> 
>  if (s instanceof Double)
> return (T) new Double(-1.0);
>  Else if( s instance of Integer)
> Return (T) new Integer(-1) ;  
> …..
> }
> 
> Does anyone know if hive supports the generic method ? Or I have to override 
> the evaluate method for each type of input. 
> 
> Thanks 
> 
> Dan
> 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


deprecated nextColumnsBatch in RCFile.Reader

2014-07-30 Thread Mattar, Marwan
Hi,

Does anyone know why nextColumnsBatch() in RCFile.Reader has been deprecated? 
I'd like to write a MapReduce where the mapper receives an entire column at a 
time (more specifically, the key is the column ID and the value is the column 
values for that row batch). Using getColumn() and nextColumnsBatch() seems like 
the natural choice. Any pointers are appreciated.

Thanks,
Marwan



RE: Hive Data

2014-07-30 Thread CHEBARO Abdallah
Thank you

From: Devopam Mittra [mailto:devo...@gmail.com]
Sent: Wednesday, July 30, 2014 5:54 PM
To: user@hive.apache.org
Subject: Re: Hive Data

You may please give dbpedia dataset a try - I am sure you won't be disappointed 
:)

regards
Dev
+91 958 305 9899

On Jul 30, 2014, at 6:05 PM, CHEBARO Abdallah 
mailto:abdallah.cheb...@murex.com>> wrote:
Till now I don’t have a file. I am willing to search online for a sample 
dataset that contains at least 1 million rows. If you know any link to a sample 
file, it would be very much appreciated.

Thank you.

From: Nitin Pawar [mailto:nitinpawar...@gmail.com]
Sent: Wednesday, July 30, 2014 3:33 PM
To: user@hive.apache.org
Subject: Re: Hive Data

hive reads the files by the input format defined by the table schema.

By default it reads the TextFile in which columns are separated by "CTRL+A" key

if you have a csv file then you can use a csv serde.
there are lots of such file formats.

what does your file look like?


On Wed, Jul 30, 2014 at 5:54 PM, CHEBARO Abdallah 
mailto:abdallah.cheb...@murex.com>> wrote:
Hello,

I am interested in testing Hive with a huge sample data. Does Hive read all 
data types? Should the file be a table?

Thank you

***

This e-mail contains information for the intended recipient only. It may 
contain proprietary material or confidential information. If you are not the 
intended recipient you are not authorised to distribute, copy or use this 
e-mail or any attachment to it. Murex cannot guarantee that it is virus free 
and accepts no responsibility for any loss or damage arising from its use. If 
you have received this e-mail in error please notify immediately the sender and 
delete the original email received, any attachments and all copies from your 
system.



--
Nitin Pawar

***

This e-mail contains information for the intended recipient only. It may 
contain proprietary material or confidential information. If you are not the 
intended recipient you are not authorised to distribute, copy or use this 
e-mail or any attachment to it. Murex cannot guarantee that it is virus free 
and accepts no responsibility for any loss or damage arising from its use. If 
you have received this e-mail in error please notify immediately the sender and 
delete the original email received, any attachments and all copies from your 
system.
***

This e-mail contains information for the intended recipient only. It may 
contain proprietary material or confidential information. If you are not the 
intended recipient you are not authorised to distribute, copy or use this 
e-mail or any attachment to it. Murex cannot guarantee that it is virus free 
and accepts no responsibility for any loss or damage arising from its use. If 
you have received this e-mail in error please notify immediately the sender and 
delete the original email received, any attachments and all copies from your 
system.


Re: Hive Data

2014-07-30 Thread Devopam Mittra
You may please give dbpedia dataset a try - I am sure you won't be disappointed 
:)

regards
Dev
+91 958 305 9899

> On Jul 30, 2014, at 6:05 PM, CHEBARO Abdallah  
> wrote:
> 
> Till now I don’t have a file. I am willing to search online for a sample 
> dataset that contains at least 1 million rows. If you know any link to a 
> sample file, it would be very much appreciated.
>  
> Thank you.
>  
> From: Nitin Pawar [mailto:nitinpawar...@gmail.com] 
> Sent: Wednesday, July 30, 2014 3:33 PM
> To: user@hive.apache.org
> Subject: Re: Hive Data
>  
> hive reads the files by the input format defined by the table schema. 
>  
> By default it reads the TextFile in which columns are separated by "CTRL+A" 
> key
>  
> if you have a csv file then you can use a csv serde. 
> there are lots of such file formats.
>  
> what does your file look like? 
>  
>  
> 
> On Wed, Jul 30, 2014 at 5:54 PM, CHEBARO Abdallah 
>  wrote:
> Hello,
>  
> I am interested in testing Hive with a huge sample data. Does Hive read all 
> data types? Should the file be a table?
>  
> Thank you
> ***
> This e-mail contains information for the intended recipient only. It may 
> contain proprietary material or confidential information. If you are not the 
> intended recipient you are not authorised to distribute, copy or use this 
> e-mail or any attachment to it. Murex cannot guarantee that it is virus free 
> and accepts no responsibility for any loss or damage arising from its use. If 
> you have received this e-mail in error please notify immediately the sender 
> and delete the original email received, any attachments and all copies from 
> your system.
> 
> 
>  
> -- 
> Nitin Pawar
> ***
> This e-mail contains information for the intended recipient only. It may 
> contain proprietary material or confidential information. If you are not the 
> intended recipient you are not authorised to distribute, copy or use this 
> e-mail or any attachment to it. Murex cannot guarantee that it is virus free 
> and accepts no responsibility for any loss or damage arising from its use. If 
> you have received this e-mail in error please notify immediately the sender 
> and delete the original email received, any attachments and all copies from 
> your system.


hive udf cannot recognize generic method

2014-07-30 Thread Dan Fan
Hi there

I am writing a hive UDF function. The input could be string, int, double etc.
The return is based on the data type. I was trying to use the generic method, 
however, hive seems not recognize it.
Here is the piece of code I have as example.


  public  T evaluate(final T s, final String column_name, final int bitmap) 
throws Exception {


 if (s instanceof Double)

return (T) new Double(-1.0);

 Else if( s instance of Integer)

Return (T) new Integer(-1) ;

…..

}


Does anyone know if hive supports the generic method ? Or I have to override 
the evaluate method for each type of input.


Thanks


Dan



Re: Exception in Hive with SMB join and Parquet

2014-07-30 Thread Suma Shivaprasad
Retried with hive.optimize.sort.dynamic.partition=false. Still seeing the
same issue.

Thanks
Suma


On Wed, Jul 30, 2014 at 6:55 PM, Nitin Pawar 
wrote:

> what's the value of the variable hive.optimize.sort.dynamic.partition
>
> can you try disabling it if it on?
>
>
> On Wed, Jul 30, 2014 at 6:43 PM, Suma Shivaprasad <
> sumasai.shivapra...@gmail.com> wrote:
>
>> Am using 0.13.0 version of hive with parquet table having 34 columns with 
>> the following props while creating the table
>>
>>
>> *CLUSTERED BY (udid)  SORTED BY (udid ASC) INTO 256 BUCKETS
>> STORED as PARQUET
>> TBLPROPERTIES ("parquet.compression"="SNAPPY"); *
>>
>> The query I am running is
>>
>>
>> *set hive.optimize.bucketmapjoin = true;
>> set hive.optimize.bucketmapjoin.sortedmerge = true;
>> set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
>> set hive.mapjoin.smalltable.filesize=2;
>> set hive.vectorized.execution.enabled = true;*
>>
>>
>> *set hive.stats.fetch.column.stats=true;
>> set hive.stats.collect.tablekeys=true;
>> set hive.stats.reliable=true;*
>>
>> *select sum(rev),sum(adimp)
>>  from user_rr_parq rr join user_domain_parq dm on rr.udid = dm.id 
>> 
>>  where dt = '..' and hour='..'
>>  and dm.age.source = '..'
>>  and dm.age.id  IN ('..')
>>  group by rr.udid;*
>>
>>
>> with both user_rr_parq and user_domain_parq both clustered and sorted by 
>> same join key
>>
>> *Exception in Mapper logs*
>>
>> 2014-07-30 12:44:08,577 INFO org.apache.hadoop.mapred.TaskLogsTruncater: 
>> Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
>>
>>
>>
>> 2014-07-30 12:44:08,579 WARN org.apache.hadoop.mapred.Child: Error running 
>> child
>> java.lang.RuntimeException: 
>> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
>> processing row 
>> {"udid":"+HkGEOKZopHELKtUDdJzOUPr5yuSHxTHN5iknyzNSjE=","optout":null,"uage":null,"ugender":null,"siteid":null,"handsetid":null,"intversion":null,"intmethod":null,"intfamily":null,"intdirect":null,"intorigin":null,"advid":null,"campgnid":null,"adgrpidbig":null,"ccid":null,"locsrc":null,"adid":null,"adidbig":null,"market":null,"nfr":null,"uidparams":null,"time":null,"disc_uidparams":null,"vldclk":null,"fraudclk":null,"totalburn":null,"pubcpc":null,"cpc":null,"rev":0.0,"adimp":0,"pgimp":null,"mkvalidadreq":null,"mkvalidpgreq":null,"map_uid":null,"dt":"2014-06-01","hour":"00"}
>>  at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:195)
>>  at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>>  at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
>>  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
>>  at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>>  at java.security.AccessController.doPrivileged(Native Method)
>>  at javax.security.auth.Subject.doAs(Subject.java:396)
>>  at 
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>>  at org.apache.hadoop.mapred.Child.main(Child.java:262)
>> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
>> Error while processing row 
>> {"udid":"+HkGEOKZopHELKtUDdJzOUPr5yuSHxTHN5iknyzNSjE=","optout":null,"uage":null,"ugender":null,"siteid":null,"handsetid":null,"intversion":null,"intmethod":null,"intfamily":null,"intdirect":null,"intorigin":null,"advid":null,"campgnid":null,"adgrpidbig":null,"ccid":null,"locsrc":null,"adid":null,"adidbig":null,"market":null,"nfr":null,"uidparams":null,"time":null,"disc_uidparams":null,"vldclk":null,"fraudclk":null,"totalburn":null,"pubcpc":null,"agencycpc":null,"rev":0.0,"adimp":0,"pgimp":null,"mkvalidadreq":null,"mkvalidpgreq":null,"map_uid":null,"dt":"2014-06-01","hour":"00"}
>>  at 
>> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550)
>>  at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177)
>>  ... 8 more*Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
>> java.io.IOException: java.lang.IndexOutOfBoundsException: Index: 29, Size: 5*
>>  at 
>> org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator$MergeQueue.nextHive(SMBMapJoinOperator.java:773)
>>  at 
>> org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator$MergeQueue.setupContext(SMBMapJoinOperator.java:710)
>>  at 
>> org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator.setUpFetchContexts(SMBMapJoinOperator.java:538)
>>  at 
>> org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator.processOp(SMBMapJoinOperator.java:248)
>>  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
>>  at 
>> org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
>>  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
>>  at 
>> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540)
>>  ... 9 more*Caused by: java.io.IOException: 
>> java.lang.IndexOutOfBoundsException

Re: Exception in Hive with SMB join and Parquet

2014-07-30 Thread Nitin Pawar
what's the value of the variable hive.optimize.sort.dynamic.partition

can you try disabling it if it on?


On Wed, Jul 30, 2014 at 6:43 PM, Suma Shivaprasad <
sumasai.shivapra...@gmail.com> wrote:

> Am using 0.13.0 version of hive with parquet table having 34 columns with the 
> following props while creating the table
>
>
> *CLUSTERED BY (udid)  SORTED BY (udid ASC) INTO 256 BUCKETS
> STORED as PARQUET
> TBLPROPERTIES ("parquet.compression"="SNAPPY"); *
>
> The query I am running is
>
>
> *set hive.optimize.bucketmapjoin = true;
> set hive.optimize.bucketmapjoin.sortedmerge = true;
> set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
> set hive.mapjoin.smalltable.filesize=2;
> set hive.vectorized.execution.enabled = true;*
>
>
> *set hive.stats.fetch.column.stats=true;
> set hive.stats.collect.tablekeys=true;
> set hive.stats.reliable=true;*
>
> *select sum(rev),sum(adimp)
>  from user_rr_parq rr join user_domain_parq dm on rr.udid = dm.id 
> 
>  where dt = '..' and hour='..'
>  and dm.age.source = '..'
>  and dm.age.id  IN ('..')
>  group by rr.udid;*
>
>
> with both user_rr_parq and user_domain_parq both clustered and sorted by same 
> join key
>
> *Exception in Mapper logs*
>
> 2014-07-30 12:44:08,577 INFO org.apache.hadoop.mapred.TaskLogsTruncater: 
> Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
>
>
> 2014-07-30 12:44:08,579 WARN org.apache.hadoop.mapred.Child: Error running 
> child
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row 
> {"udid":"+HkGEOKZopHELKtUDdJzOUPr5yuSHxTHN5iknyzNSjE=","optout":null,"uage":null,"ugender":null,"siteid":null,"handsetid":null,"intversion":null,"intmethod":null,"intfamily":null,"intdirect":null,"intorigin":null,"advid":null,"campgnid":null,"adgrpidbig":null,"ccid":null,"locsrc":null,"adid":null,"adidbig":null,"market":null,"nfr":null,"uidparams":null,"time":null,"disc_uidparams":null,"vldclk":null,"fraudclk":null,"totalburn":null,"pubcpc":null,"cpc":null,"rev":0.0,"adimp":0,"pgimp":null,"mkvalidadreq":null,"mkvalidpgreq":null,"map_uid":null,"dt":"2014-06-01","hour":"00"}
>   at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:195)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
>   at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>   at org.apache.hadoop.mapred.Child.main(Child.java:262)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row 
> {"udid":"+HkGEOKZopHELKtUDdJzOUPr5yuSHxTHN5iknyzNSjE=","optout":null,"uage":null,"ugender":null,"siteid":null,"handsetid":null,"intversion":null,"intmethod":null,"intfamily":null,"intdirect":null,"intorigin":null,"advid":null,"campgnid":null,"adgrpidbig":null,"ccid":null,"locsrc":null,"adid":null,"adidbig":null,"market":null,"nfr":null,"uidparams":null,"time":null,"disc_uidparams":null,"vldclk":null,"fraudclk":null,"totalburn":null,"pubcpc":null,"agencycpc":null,"rev":0.0,"adimp":0,"pgimp":null,"mkvalidadreq":null,"mkvalidpgreq":null,"map_uid":null,"dt":"2014-06-01","hour":"00"}
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550)
>   at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177)
>   ... 8 more*Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.io.IOException: java.lang.IndexOutOfBoundsException: Index: 29, Size: 5*
>   at 
> org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator$MergeQueue.nextHive(SMBMapJoinOperator.java:773)
>   at 
> org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator$MergeQueue.setupContext(SMBMapJoinOperator.java:710)
>   at 
> org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator.setUpFetchContexts(SMBMapJoinOperator.java:538)
>   at 
> org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator.processOp(SMBMapJoinOperator.java:248)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
>   at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540)
>   ... 9 more*Caused by: java.io.IOException: 
> java.lang.IndexOutOfBoundsException: Index: 29, Size: 5*
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:636)
>   at 
> org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator$MergeQueue.next(SMBMapJoinOperator.java:794)
>  

Exception in Hive with SMB join and Parquet

2014-07-30 Thread Suma Shivaprasad
Am using 0.13.0 version of hive with parquet table having 34 columns
with the following props while creating the table


*CLUSTERED BY (udid)  SORTED BY (udid ASC) INTO 256 BUCKETS
STORED as PARQUET
TBLPROPERTIES ("parquet.compression"="SNAPPY"); *

The query I am running is


*set hive.optimize.bucketmapjoin = true;
set hive.optimize.bucketmapjoin.sortedmerge = true;
set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
set hive.mapjoin.smalltable.filesize=2;
set hive.vectorized.execution.enabled = true;*


*set hive.stats.fetch.column.stats=true;
set hive.stats.collect.tablekeys=true;
set hive.stats.reliable=true;*

*select sum(rev),sum(adimp)
 from user_rr_parq rr join user_domain_parq dm on rr.udid = dm.id 
 where dt = '..' and hour='..'
 and dm.age.source = '..'
 and dm.age.id  IN ('..')
 group by rr.udid;*


with both user_rr_parq and user_domain_parq both clustered and sorted
by same join key

*Exception in Mapper logs*

2014-07-30 12:44:08,577 INFO
org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs'
truncater with mapRetainSize=-1 and reduceRetainSize=-1

2014-07-30 12:44:08,579 WARN org.apache.hadoop.mapred.Child: Error running child
java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error
while processing row
{"udid":"+HkGEOKZopHELKtUDdJzOUPr5yuSHxTHN5iknyzNSjE=","optout":null,"uage":null,"ugender":null,"siteid":null,"handsetid":null,"intversion":null,"intmethod":null,"intfamily":null,"intdirect":null,"intorigin":null,"advid":null,"campgnid":null,"adgrpidbig":null,"ccid":null,"locsrc":null,"adid":null,"adidbig":null,"market":null,"nfr":null,"uidparams":null,"time":null,"disc_uidparams":null,"vldclk":null,"fraudclk":null,"totalburn":null,"pubcpc":null,"cpc":null,"rev":0.0,"adimp":0,"pgimp":null,"mkvalidadreq":null,"mkvalidpgreq":null,"map_uid":null,"dt":"2014-06-01","hour":"00"}
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:195)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive
Runtime Error while processing row
{"udid":"+HkGEOKZopHELKtUDdJzOUPr5yuSHxTHN5iknyzNSjE=","optout":null,"uage":null,"ugender":null,"siteid":null,"handsetid":null,"intversion":null,"intmethod":null,"intfamily":null,"intdirect":null,"intorigin":null,"advid":null,"campgnid":null,"adgrpidbig":null,"ccid":null,"locsrc":null,"adid":null,"adidbig":null,"market":null,"nfr":null,"uidparams":null,"time":null,"disc_uidparams":null,"vldclk":null,"fraudclk":null,"totalburn":null,"pubcpc":null,"agencycpc":null,"rev":0.0,"adimp":0,"pgimp":null,"mkvalidadreq":null,"mkvalidpgreq":null,"map_uid":null,"dt":"2014-06-01","hour":"00"}
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177)
... 8 more*Caused by:
org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException:
java.lang.IndexOutOfBoundsException: Index: 29, Size: 5*
at 
org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator$MergeQueue.nextHive(SMBMapJoinOperator.java:773)
at 
org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator$MergeQueue.setupContext(SMBMapJoinOperator.java:710)
at 
org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator.setUpFetchContexts(SMBMapJoinOperator.java:538)
at 
org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator.processOp(SMBMapJoinOperator.java:248)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540)
... 9 more*Caused by: java.io.IOException:
java.lang.IndexOutOfBoundsException: Index: 29, Size: 5*
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:636)
at 
org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator$MergeQueue.next(SMBMapJoinOperator.java:794)
at 
org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator$MergeQueue.nextHive(SMBMapJoinOperator.java:771)
... 16 more*Caused by: java.lang.IndexOutOfBoundsException: Index: 29, 
Size: 5*
at java.util.ArrayList.RangeCheck(ArrayList.java:547)
at java.util.ArrayList.get(ArrayList.java:322)
   

RE: Hive Data

2014-07-30 Thread CHEBARO Abdallah
Till now I don’t have a file. I am willing to search online for a sample 
dataset that contains at least 1 million rows. If you know any link to a sample 
file, it would be very much appreciated.

Thank you.

From: Nitin Pawar [mailto:nitinpawar...@gmail.com]
Sent: Wednesday, July 30, 2014 3:33 PM
To: user@hive.apache.org
Subject: Re: Hive Data

hive reads the files by the input format defined by the table schema.

By default it reads the TextFile in which columns are separated by "CTRL+A" key

if you have a csv file then you can use a csv serde.
there are lots of such file formats.

what does your file look like?


On Wed, Jul 30, 2014 at 5:54 PM, CHEBARO Abdallah 
mailto:abdallah.cheb...@murex.com>> wrote:
Hello,

I am interested in testing Hive with a huge sample data. Does Hive read all 
data types? Should the file be a table?

Thank you

***

This e-mail contains information for the intended recipient only. It may 
contain proprietary material or confidential information. If you are not the 
intended recipient you are not authorised to distribute, copy or use this 
e-mail or any attachment to it. Murex cannot guarantee that it is virus free 
and accepts no responsibility for any loss or damage arising from its use. If 
you have received this e-mail in error please notify immediately the sender and 
delete the original email received, any attachments and all copies from your 
system.



--
Nitin Pawar
***

This e-mail contains information for the intended recipient only. It may 
contain proprietary material or confidential information. If you are not the 
intended recipient you are not authorised to distribute, copy or use this 
e-mail or any attachment to it. Murex cannot guarantee that it is virus free 
and accepts no responsibility for any loss or damage arising from its use. If 
you have received this e-mail in error please notify immediately the sender and 
delete the original email received, any attachments and all copies from your 
system.


Re: Hive Data

2014-07-30 Thread Nitin Pawar
hive reads the files by the input format defined by the table schema.

By default it reads the TextFile in which columns are separated by "CTRL+A"
key

if you have a csv file then you can use a csv serde.
there are lots of such file formats.

what does your file look like?



On Wed, Jul 30, 2014 at 5:54 PM, CHEBARO Abdallah <
abdallah.cheb...@murex.com> wrote:

>  Hello,
>
>
>
> I am interested in testing Hive with a huge sample data. Does Hive read
> all data types? Should the file be a table?
>
>
>
> Thank you
>
> ***
>
> This e-mail contains information for the intended recipient only. It may
> contain proprietary material or confidential information. If you are not
> the intended recipient you are not authorised to distribute, copy or use
> this e-mail or any attachment to it. Murex cannot guarantee that it is
> virus free and accepts no responsibility for any loss or damage arising
> from its use. If you have received this e-mail in error please notify
> immediately the sender and delete the original email received, any
> attachments and all copies from your system.
>



-- 
Nitin Pawar


Hive Data

2014-07-30 Thread CHEBARO Abdallah
Hello,

I am interested in testing Hive with a huge sample data. Does Hive read all 
data types? Should the file be a table?

Thank you
***

This e-mail contains information for the intended recipient only. It may 
contain proprietary material or confidential information. If you are not the 
intended recipient you are not authorised to distribute, copy or use this 
e-mail or any attachment to it. Murex cannot guarantee that it is virus free 
and accepts no responsibility for any loss or damage arising from its use. If 
you have received this e-mail in error please notify immediately the sender and 
delete the original email received, any attachments and all copies from your 
system.


RE: SELECT specific data

2014-07-30 Thread CHEBARO Abdallah
Thank you

From: Devopam Mittra [mailto:devo...@gmail.com]
Sent: Wednesday, July 30, 2014 2:57 PM
To: user@hive.apache.org
Subject: Re: SELECT specific data

If you have a defined table, then loading partial columns will be easiest 
handled with inserting the rest columns with NULL value after mapping your 
partial column file as an external table.

regards
Devopam

On Wed, Jul 30, 2014 at 2:49 PM, CHEBARO Abdallah 
mailto:abdallah.cheb...@murex.com>> wrote:
I am only using Hive and hadoop, nothing more.

From: Devopam Mittra [mailto:devo...@gmail.com]
Sent: Wednesday, July 30, 2014 12:15 PM

To: user@hive.apache.org
Subject: Re: SELECT specific data

Are you using any tool to load data ? If yes, then the ETL tool will provide 
you such options.
If not, then please explore unix file processing/external table route.

On Wed, Jul 30, 2014 at 2:09 PM, CHEBARO Abdallah 
mailto:abdallah.cheb...@murex.com>> wrote:
Hello,

Thank you for your reply.

Consider we have data divided into 5 columns (col1, col2, col3, col4, col5).
So I can’t load directly col1, col3 and col5?
If I can’t do it directly, can you provide me with an alternate solution?

Thank you.

From: Nitin Pawar 
[mailto:nitinpawar...@gmail.com]
Sent: Wednesday, July 30, 2014 11:37 AM
To: user@hive.apache.org
Subject: Re: SELECT specific data

you mean just by writing query then I think no.

But if you want to read only first 3 columns of the data then it would work 
with just a single table and load data into

On Wed, Jul 30, 2014 at 1:47 PM, CHEBARO Abdallah 
mailto:abdallah.cheb...@murex.com>> wrote:
Hello,

I am interested in selecting specific data from a source and loading it to a 
table. For example, if I have 5 columns in my dataset, I want to load 3 columns 
of it. Is it possible to do it without create a second table?

Thank you

***

This e-mail contains information for the intended recipient only. It may 
contain proprietary material or confidential information. If you are not the 
intended recipient you are not authorised to distribute, copy or use this 
e-mail or any attachment to it. Murex cannot guarantee that it is virus free 
and accepts no responsibility for any loss or damage arising from its use. If 
you have received this e-mail in error please notify immediately the sender and 
delete the original email received, any attachments and all copies from your 
system.



--
Nitin Pawar

***

This e-mail contains information for the intended recipient only. It may 
contain proprietary material or confidential information. If you are not the 
intended recipient you are not authorised to distribute, copy or use this 
e-mail or any attachment to it. Murex cannot guarantee that it is virus free 
and accepts no responsibility for any loss or damage arising from its use. If 
you have received this e-mail in error please notify immediately the sender and 
delete the original email received, any attachments and all copies from your 
system.



--
Devopam Mittra
Life and Relations are not binary

***

This e-mail contains information for the intended recipient only. It may 
contain proprietary material or confidential information. If you are not the 
intended recipient you are not authorised to distribute, copy or use this 
e-mail or any attachment to it. Murex cannot guarantee that it is virus free 
and accepts no responsibility for any loss or damage arising from its use. If 
you have received this e-mail in error please notify immediately the sender and 
delete the original email received, any attachments and all copies from your 
system.



--
Devopam Mittra
Life and Relations are not binary
***

This e-mail contains information for the intended recipient only. It may 
contain proprietary material or confidential information. If you are not the 
intended recipient you are not authorised to distribute, copy or use this 
e-mail or any attachment to it. Murex cannot guarantee that it is virus free 
and accepts no responsibility for any loss or damage arising from its use. If 
you have received this e-mail in error please notify immediately the sender and 
delete the original email received, any attachments and all copies from your 
system.


Re: SELECT specific data

2014-07-30 Thread Devopam Mittra
If you have a defined table, then loading partial columns will be easiest
handled with inserting the rest columns with NULL value after mapping your
partial column file as an external table.

regards
Devopam


On Wed, Jul 30, 2014 at 2:49 PM, CHEBARO Abdallah <
abdallah.cheb...@murex.com> wrote:

>  I am only using Hive and hadoop, nothing more.
>
>
>
> *From:* Devopam Mittra [mailto:devo...@gmail.com]
> *Sent:* Wednesday, July 30, 2014 12:15 PM
>
> *To:* user@hive.apache.org
> *Subject:* Re: SELECT specific data
>
>
>
> Are you using any tool to load data ? If yes, then the ETL tool will
> provide you such options.
>
> If not, then please explore unix file processing/external table route.
>
>
>
> On Wed, Jul 30, 2014 at 2:09 PM, CHEBARO Abdallah <
> abdallah.cheb...@murex.com> wrote:
>
> Hello,
>
>
>
> Thank you for your reply.
>
>
>
> Consider we have data divided into 5 columns (col1, col2, col3, col4,
> col5).
>
> So I can’t load directly col1, col3 and col5?
>
> If I can’t do it directly, can you provide me with an alternate solution?
>
>
>
> Thank you.
>
>
>
> *From:* Nitin Pawar [mailto:nitinpawar...@gmail.com]
> *Sent:* Wednesday, July 30, 2014 11:37 AM
> *To:* user@hive.apache.org
> *Subject:* Re: SELECT specific data
>
>
>
> you mean just by writing query then I think no.
>
>
>
> But if you want to read only first 3 columns of the data then it would
> work with just a single table and load data into
>
>
>
> On Wed, Jul 30, 2014 at 1:47 PM, CHEBARO Abdallah <
> abdallah.cheb...@murex.com> wrote:
>
> Hello,
>
>
>
> I am interested in selecting specific data from a source and loading it to
> a table. For example, if I have 5 columns in my dataset, I want to load 3
> columns of it. Is it possible to do it without create a second table?
>
>
>
> Thank you
>
> ***
>
> This e-mail contains information for the intended recipient only. It may
> contain proprietary material or confidential information. If you are not
> the intended recipient you are not authorised to distribute, copy or use
> this e-mail or any attachment to it. Murex cannot guarantee that it is
> virus free and accepts no responsibility for any loss or damage arising
> from its use. If you have received this e-mail in error please notify
> immediately the sender and delete the original email received, any
> attachments and all copies from your system.
>
>
>
>
>
> --
> Nitin Pawar
>
> ***
>
> This e-mail contains information for the intended recipient only. It may
> contain proprietary material or confidential information. If you are not
> the intended recipient you are not authorised to distribute, copy or use
> this e-mail or any attachment to it. Murex cannot guarantee that it is
> virus free and accepts no responsibility for any loss or damage arising
> from its use. If you have received this e-mail in error please notify
> immediately the sender and delete the original email received, any
> attachments and all copies from your system.
>
>
>
>
>
> --
> Devopam Mittra
> Life and Relations are not binary
>
> ***
>
> This e-mail contains information for the intended recipient only. It may
> contain proprietary material or confidential information. If you are not
> the intended recipient you are not authorised to distribute, copy or use
> this e-mail or any attachment to it. Murex cannot guarantee that it is
> virus free and accepts no responsibility for any loss or damage arising
> from its use. If you have received this e-mail in error please notify
> immediately the sender and delete the original email received, any
> attachments and all copies from your system.
>



-- 
Devopam Mittra
Life and Relations are not binary


RE: SELECT specific data

2014-07-30 Thread CHEBARO Abdallah
Thank you very much, your respond was very helpful

From: Nitin Pawar [mailto:nitinpawar...@gmail.com]
Sent: Wednesday, July 30, 2014 12:53 PM
To: user@hive.apache.org
Subject: Re: SELECT specific data

Please check another mail i sent right after that.
my bad had hit send button too soon without reading the mail.

I will rephrase

In hive to process the data, you will need the table created and data loaded to 
the table.
You can not process a file without loading it into a table.

If you want to do that and do not want to create a temporary table in hive with 
full columns from file then options available to you are
1) simple  unix tools like awk or sed or cut
2) write a pig script
3) write your own mapreduce code


On Wed, Jul 30, 2014 at 3:09 PM, CHEBARO Abdallah 
mailto:abdallah.cheb...@murex.com>> wrote:
“With hive, without creating a table with full data, you can do intermediate 
processing like select only few columns and write into another table”. How can 
I do this process?

Thank you alot!

From: Nitin Pawar 
[mailto:nitinpawar...@gmail.com]
Sent: Wednesday, July 30, 2014 12:37 PM

To: user@hive.apache.org
Subject: Re: SELECT specific data

sorry hit send too soon ..
I mean without creating intermediate tables, in hive you can process the file 
directly

On Wed, Jul 30, 2014 at 3:06 PM, Nitin Pawar 
mailto:nitinpawar...@gmail.com>> wrote:
With hive, without creating a table with full data, you can do intermediate 
processing like select only few columns and write into another table,

If this is something one time then you can take a look at awk or cut commands 
in linux and generate those files only.

On Wed, Jul 30, 2014 at 2:49 PM, CHEBARO Abdallah 
mailto:abdallah.cheb...@murex.com>> wrote:
I am only using Hive and hadoop, nothing more.

From: Devopam Mittra [mailto:devo...@gmail.com]
Sent: Wednesday, July 30, 2014 12:15 PM

To: user@hive.apache.org
Subject: Re: SELECT specific data

Are you using any tool to load data ? If yes, then the ETL tool will provide 
you such options.
If not, then please explore unix file processing/external table route.

On Wed, Jul 30, 2014 at 2:09 PM, CHEBARO Abdallah 
mailto:abdallah.cheb...@murex.com>> wrote:
Hello,

Thank you for your reply.

Consider we have data divided into 5 columns (col1, col2, col3, col4, col5).
So I can’t load directly col1, col3 and col5?
If I can’t do it directly, can you provide me with an alternate solution?

Thank you.

From: Nitin Pawar 
[mailto:nitinpawar...@gmail.com]
Sent: Wednesday, July 30, 2014 11:37 AM
To: user@hive.apache.org
Subject: Re: SELECT specific data

you mean just by writing query then I think no.

But if you want to read only first 3 columns of the data then it would work 
with just a single table and load data into

On Wed, Jul 30, 2014 at 1:47 PM, CHEBARO Abdallah 
mailto:abdallah.cheb...@murex.com>> wrote:
Hello,

I am interested in selecting specific data from a source and loading it to a 
table. For example, if I have 5 columns in my dataset, I want to load 3 columns 
of it. Is it possible to do it without create a second table?

Thank you

***

This e-mail contains information for the intended recipient only. It may 
contain proprietary material or confidential information. If you are not the 
intended recipient you are not authorised to distribute, copy or use this 
e-mail or any attachment to it. Murex cannot guarantee that it is virus free 
and accepts no responsibility for any loss or damage arising from its use. If 
you have received this e-mail in error please notify immediately the sender and 
delete the original email received, any attachments and all copies from your 
system.



--
Nitin Pawar

***

This e-mail contains information for the intended recipient only. It may 
contain proprietary material or confidential information. If you are not the 
intended recipient you are not authorised to distribute, copy or use this 
e-mail or any attachment to it. Murex cannot guarantee that it is virus free 
and accepts no responsibility for any loss or damage arising from its use. If 
you have received this e-mail in error please notify immediately the sender and 
delete the original email received, any attachments and all copies from your 
system.



--
Devopam Mittra
Life and Relations are not binary

***

This e-mail contains information for the intended recipient only. It may 
contain proprietary material or confidential information. If you are not the 
intended recipient you are not authorised to distribute, copy or use this 
e-mail or any attachment to it. Murex cannot guarantee that it is virus free 
and accepts no responsibility for any loss or damage arising from its use. If 
you have received this e-mail in error please notify immediately th

Re: SELECT specific data

2014-07-30 Thread Nitin Pawar
Please check another mail i sent right after that.
my bad had hit send button too soon without reading the mail.

I will rephrase

In hive to process the data, you will need the table created and data
loaded to the table.
You can not process a file without loading it into a table.

If you want to do that and do not want to create a temporary table in hive
with full columns from file then options available to you are
1) simple  unix tools like awk or sed or cut
2) write a pig script
3) write your own mapreduce code



On Wed, Jul 30, 2014 at 3:09 PM, CHEBARO Abdallah <
abdallah.cheb...@murex.com> wrote:

>  “With hive, without creating a table with full data, you can do
> intermediate processing like select only few columns and write into another
> table”. How can I do this process?
>
>
>
> Thank you alot!
>
>
>
> *From:* Nitin Pawar [mailto:nitinpawar...@gmail.com]
> *Sent:* Wednesday, July 30, 2014 12:37 PM
>
> *To:* user@hive.apache.org
> *Subject:* Re: SELECT specific data
>
>
>
> sorry hit send too soon ..
>
> I mean without creating intermediate tables, in hive you can process the
> file directly
>
>
>
> On Wed, Jul 30, 2014 at 3:06 PM, Nitin Pawar 
> wrote:
>
> With hive, without creating a table with full data, you can do
> intermediate processing like select only few columns and write into another
> table,
>
>
>
> If this is something one time then you can take a look at awk or cut
> commands in linux and generate those files only.
>
>
>
> On Wed, Jul 30, 2014 at 2:49 PM, CHEBARO Abdallah <
> abdallah.cheb...@murex.com> wrote:
>
> I am only using Hive and hadoop, nothing more.
>
>
>
> *From:* Devopam Mittra [mailto:devo...@gmail.com]
> *Sent:* Wednesday, July 30, 2014 12:15 PM
>
>
> *To:* user@hive.apache.org
> *Subject:* Re: SELECT specific data
>
>
>
> Are you using any tool to load data ? If yes, then the ETL tool will
> provide you such options.
>
> If not, then please explore unix file processing/external table route.
>
>
>
> On Wed, Jul 30, 2014 at 2:09 PM, CHEBARO Abdallah <
> abdallah.cheb...@murex.com> wrote:
>
> Hello,
>
>
>
> Thank you for your reply.
>
>
>
> Consider we have data divided into 5 columns (col1, col2, col3, col4,
> col5).
>
> So I can’t load directly col1, col3 and col5?
>
> If I can’t do it directly, can you provide me with an alternate solution?
>
>
>
> Thank you.
>
>
>
> *From:* Nitin Pawar [mailto:nitinpawar...@gmail.com]
> *Sent:* Wednesday, July 30, 2014 11:37 AM
> *To:* user@hive.apache.org
> *Subject:* Re: SELECT specific data
>
>
>
> you mean just by writing query then I think no.
>
>
>
> But if you want to read only first 3 columns of the data then it would
> work with just a single table and load data into
>
>
>
> On Wed, Jul 30, 2014 at 1:47 PM, CHEBARO Abdallah <
> abdallah.cheb...@murex.com> wrote:
>
> Hello,
>
>
>
> I am interested in selecting specific data from a source and loading it to
> a table. For example, if I have 5 columns in my dataset, I want to load 3
> columns of it. Is it possible to do it without create a second table?
>
>
>
> Thank you
>
> ***
>
> This e-mail contains information for the intended recipient only. It may
> contain proprietary material or confidential information. If you are not
> the intended recipient you are not authorised to distribute, copy or use
> this e-mail or any attachment to it. Murex cannot guarantee that it is
> virus free and accepts no responsibility for any loss or damage arising
> from its use. If you have received this e-mail in error please notify
> immediately the sender and delete the original email received, any
> attachments and all copies from your system.
>
>
>
>
>
> --
> Nitin Pawar
>
> ***
>
> This e-mail contains information for the intended recipient only. It may
> contain proprietary material or confidential information. If you are not
> the intended recipient you are not authorised to distribute, copy or use
> this e-mail or any attachment to it. Murex cannot guarantee that it is
> virus free and accepts no responsibility for any loss or damage arising
> from its use. If you have received this e-mail in error please notify
> immediately the sender and delete the original email received, any
> attachments and all copies from your system.
>
>
>
>
>
> --
> Devopam Mittra
> Life and Relations are not binary
>
> ***
>
> This e-mail contains information for the intended recipient only. It may
> contain proprietary material or confidential information. If you are not
> the intended recipient you are not authorised to distribute, copy or use
> this e-mail or any attachment to it. Murex cannot guarantee that it is
> virus free and accepts no responsibility for any loss or damage arising
> from its use. If you have received this e-mail in error please notify
> immediately the sender and delete the original email received, any
> attachments and all copies from your system.
>
>
>
>
>
> --
> Nitin Pawar
>
>
>
>
>
> --
> Nitin 

about collect_set and ordering

2014-07-30 Thread Furcy Pin
Hi all,

I just wanted to point out a little gotcha we got while using the
collect_set UDF :

You should not perform directly a GROUP BY over a collect_set(...), because
the
set is cast as an array and is not necessarilly sorted.

For example, we ran a query looking like this;

SELECT
set,
COUNT(1) as nb
GROUP BY set
FROM
(
SELECT
colA
collect_set(colB) as set
FROM db.table
GROUP BY colA
) T
;

and got :

[A] 10205
[B] 93856
[A,B] 34865
[B,A] 48324

We had to replace it with a sort_array(collect_set(...)).

I just wanted to point out that perhaps this subtelty should be mentionned
in the doc of the collect_set UDF...

As a corollary, do you guys think a collect_sorted_set using a TreeSet
would be useful (and/or
more efficient than using sort_array(collect_set)) ?

Regards,

Furcy


RE: SELECT specific data

2014-07-30 Thread CHEBARO Abdallah
“With hive, without creating a table with full data, you can do intermediate 
processing like select only few columns and write into another table”. How can 
I do this process?

Thank you alot!

From: Nitin Pawar [mailto:nitinpawar...@gmail.com]
Sent: Wednesday, July 30, 2014 12:37 PM
To: user@hive.apache.org
Subject: Re: SELECT specific data

sorry hit send too soon ..
I mean without creating intermediate tables, in hive you can process the file 
directly

On Wed, Jul 30, 2014 at 3:06 PM, Nitin Pawar 
mailto:nitinpawar...@gmail.com>> wrote:
With hive, without creating a table with full data, you can do intermediate 
processing like select only few columns and write into another table,

If this is something one time then you can take a look at awk or cut commands 
in linux and generate those files only.

On Wed, Jul 30, 2014 at 2:49 PM, CHEBARO Abdallah 
mailto:abdallah.cheb...@murex.com>> wrote:
I am only using Hive and hadoop, nothing more.

From: Devopam Mittra [mailto:devo...@gmail.com]
Sent: Wednesday, July 30, 2014 12:15 PM

To: user@hive.apache.org
Subject: Re: SELECT specific data

Are you using any tool to load data ? If yes, then the ETL tool will provide 
you such options.
If not, then please explore unix file processing/external table route.

On Wed, Jul 30, 2014 at 2:09 PM, CHEBARO Abdallah 
mailto:abdallah.cheb...@murex.com>> wrote:
Hello,

Thank you for your reply.

Consider we have data divided into 5 columns (col1, col2, col3, col4, col5).
So I can’t load directly col1, col3 and col5?
If I can’t do it directly, can you provide me with an alternate solution?

Thank you.

From: Nitin Pawar 
[mailto:nitinpawar...@gmail.com]
Sent: Wednesday, July 30, 2014 11:37 AM
To: user@hive.apache.org
Subject: Re: SELECT specific data

you mean just by writing query then I think no.

But if you want to read only first 3 columns of the data then it would work 
with just a single table and load data into

On Wed, Jul 30, 2014 at 1:47 PM, CHEBARO Abdallah 
mailto:abdallah.cheb...@murex.com>> wrote:
Hello,

I am interested in selecting specific data from a source and loading it to a 
table. For example, if I have 5 columns in my dataset, I want to load 3 columns 
of it. Is it possible to do it without create a second table?

Thank you

***

This e-mail contains information for the intended recipient only. It may 
contain proprietary material or confidential information. If you are not the 
intended recipient you are not authorised to distribute, copy or use this 
e-mail or any attachment to it. Murex cannot guarantee that it is virus free 
and accepts no responsibility for any loss or damage arising from its use. If 
you have received this e-mail in error please notify immediately the sender and 
delete the original email received, any attachments and all copies from your 
system.



--
Nitin Pawar

***

This e-mail contains information for the intended recipient only. It may 
contain proprietary material or confidential information. If you are not the 
intended recipient you are not authorised to distribute, copy or use this 
e-mail or any attachment to it. Murex cannot guarantee that it is virus free 
and accepts no responsibility for any loss or damage arising from its use. If 
you have received this e-mail in error please notify immediately the sender and 
delete the original email received, any attachments and all copies from your 
system.



--
Devopam Mittra
Life and Relations are not binary

***

This e-mail contains information for the intended recipient only. It may 
contain proprietary material or confidential information. If you are not the 
intended recipient you are not authorised to distribute, copy or use this 
e-mail or any attachment to it. Murex cannot guarantee that it is virus free 
and accepts no responsibility for any loss or damage arising from its use. If 
you have received this e-mail in error please notify immediately the sender and 
delete the original email received, any attachments and all copies from your 
system.



--
Nitin Pawar



--
Nitin Pawar
***

This e-mail contains information for the intended recipient only. It may 
contain proprietary material or confidential information. If you are not the 
intended recipient you are not authorised to distribute, copy or use this 
e-mail or any attachment to it. Murex cannot guarantee that it is virus free 
and accepts no responsibility for any loss or damage arising from its use. If 
you have received this e-mail in error please notify immediately the sender and 
delete the original email received, any attachments and all copies from your 
system.


Re: SELECT specific data

2014-07-30 Thread Nitin Pawar
With hive, without creating a table with full data, you can do intermediate
processing like select only few columns and write into another table,

If this is something one time then you can take a look at awk or cut
commands in linux and generate those files only.


On Wed, Jul 30, 2014 at 2:49 PM, CHEBARO Abdallah <
abdallah.cheb...@murex.com> wrote:

>  I am only using Hive and hadoop, nothing more.
>
>
>
> *From:* Devopam Mittra [mailto:devo...@gmail.com]
> *Sent:* Wednesday, July 30, 2014 12:15 PM
>
> *To:* user@hive.apache.org
> *Subject:* Re: SELECT specific data
>
>
>
> Are you using any tool to load data ? If yes, then the ETL tool will
> provide you such options.
>
> If not, then please explore unix file processing/external table route.
>
>
>
> On Wed, Jul 30, 2014 at 2:09 PM, CHEBARO Abdallah <
> abdallah.cheb...@murex.com> wrote:
>
> Hello,
>
>
>
> Thank you for your reply.
>
>
>
> Consider we have data divided into 5 columns (col1, col2, col3, col4,
> col5).
>
> So I can’t load directly col1, col3 and col5?
>
> If I can’t do it directly, can you provide me with an alternate solution?
>
>
>
> Thank you.
>
>
>
> *From:* Nitin Pawar [mailto:nitinpawar...@gmail.com]
> *Sent:* Wednesday, July 30, 2014 11:37 AM
> *To:* user@hive.apache.org
> *Subject:* Re: SELECT specific data
>
>
>
> you mean just by writing query then I think no.
>
>
>
> But if you want to read only first 3 columns of the data then it would
> work with just a single table and load data into
>
>
>
> On Wed, Jul 30, 2014 at 1:47 PM, CHEBARO Abdallah <
> abdallah.cheb...@murex.com> wrote:
>
> Hello,
>
>
>
> I am interested in selecting specific data from a source and loading it to
> a table. For example, if I have 5 columns in my dataset, I want to load 3
> columns of it. Is it possible to do it without create a second table?
>
>
>
> Thank you
>
> ***
>
> This e-mail contains information for the intended recipient only. It may
> contain proprietary material or confidential information. If you are not
> the intended recipient you are not authorised to distribute, copy or use
> this e-mail or any attachment to it. Murex cannot guarantee that it is
> virus free and accepts no responsibility for any loss or damage arising
> from its use. If you have received this e-mail in error please notify
> immediately the sender and delete the original email received, any
> attachments and all copies from your system.
>
>
>
>
>
> --
> Nitin Pawar
>
> ***
>
> This e-mail contains information for the intended recipient only. It may
> contain proprietary material or confidential information. If you are not
> the intended recipient you are not authorised to distribute, copy or use
> this e-mail or any attachment to it. Murex cannot guarantee that it is
> virus free and accepts no responsibility for any loss or damage arising
> from its use. If you have received this e-mail in error please notify
> immediately the sender and delete the original email received, any
> attachments and all copies from your system.
>
>
>
>
>
> --
> Devopam Mittra
> Life and Relations are not binary
>
> ***
>
> This e-mail contains information for the intended recipient only. It may
> contain proprietary material or confidential information. If you are not
> the intended recipient you are not authorised to distribute, copy or use
> this e-mail or any attachment to it. Murex cannot guarantee that it is
> virus free and accepts no responsibility for any loss or damage arising
> from its use. If you have received this e-mail in error please notify
> immediately the sender and delete the original email received, any
> attachments and all copies from your system.
>



-- 
Nitin Pawar


Re: SELECT specific data

2014-07-30 Thread Nitin Pawar
sorry hit send too soon ..
I mean without creating intermediate tables, in hive you can process the
file directly


On Wed, Jul 30, 2014 at 3:06 PM, Nitin Pawar 
wrote:

> With hive, without creating a table with full data, you can do
> intermediate processing like select only few columns and write into another
> table,
>
> If this is something one time then you can take a look at awk or cut
> commands in linux and generate those files only.
>
>
> On Wed, Jul 30, 2014 at 2:49 PM, CHEBARO Abdallah <
> abdallah.cheb...@murex.com> wrote:
>
>>  I am only using Hive and hadoop, nothing more.
>>
>>
>>
>> *From:* Devopam Mittra [mailto:devo...@gmail.com]
>> *Sent:* Wednesday, July 30, 2014 12:15 PM
>>
>> *To:* user@hive.apache.org
>> *Subject:* Re: SELECT specific data
>>
>>
>>
>> Are you using any tool to load data ? If yes, then the ETL tool will
>> provide you such options.
>>
>> If not, then please explore unix file processing/external table route.
>>
>>
>>
>> On Wed, Jul 30, 2014 at 2:09 PM, CHEBARO Abdallah <
>> abdallah.cheb...@murex.com> wrote:
>>
>> Hello,
>>
>>
>>
>> Thank you for your reply.
>>
>>
>>
>> Consider we have data divided into 5 columns (col1, col2, col3, col4,
>> col5).
>>
>> So I can’t load directly col1, col3 and col5?
>>
>> If I can’t do it directly, can you provide me with an alternate solution?
>>
>>
>>
>> Thank you.
>>
>>
>>
>> *From:* Nitin Pawar [mailto:nitinpawar...@gmail.com]
>> *Sent:* Wednesday, July 30, 2014 11:37 AM
>> *To:* user@hive.apache.org
>> *Subject:* Re: SELECT specific data
>>
>>
>>
>> you mean just by writing query then I think no.
>>
>>
>>
>> But if you want to read only first 3 columns of the data then it would
>> work with just a single table and load data into
>>
>>
>>
>> On Wed, Jul 30, 2014 at 1:47 PM, CHEBARO Abdallah <
>> abdallah.cheb...@murex.com> wrote:
>>
>> Hello,
>>
>>
>>
>> I am interested in selecting specific data from a source and loading it
>> to a table. For example, if I have 5 columns in my dataset, I want to load
>> 3 columns of it. Is it possible to do it without create a second table?
>>
>>
>>
>> Thank you
>>
>> ***
>>
>> This e-mail contains information for the intended recipient only. It may
>> contain proprietary material or confidential information. If you are not
>> the intended recipient you are not authorised to distribute, copy or use
>> this e-mail or any attachment to it. Murex cannot guarantee that it is
>> virus free and accepts no responsibility for any loss or damage arising
>> from its use. If you have received this e-mail in error please notify
>> immediately the sender and delete the original email received, any
>> attachments and all copies from your system.
>>
>>
>>
>>
>>
>> --
>> Nitin Pawar
>>
>> ***
>>
>> This e-mail contains information for the intended recipient only. It may
>> contain proprietary material or confidential information. If you are not
>> the intended recipient you are not authorised to distribute, copy or use
>> this e-mail or any attachment to it. Murex cannot guarantee that it is
>> virus free and accepts no responsibility for any loss or damage arising
>> from its use. If you have received this e-mail in error please notify
>> immediately the sender and delete the original email received, any
>> attachments and all copies from your system.
>>
>>
>>
>>
>>
>> --
>> Devopam Mittra
>> Life and Relations are not binary
>>
>> ***
>>
>> This e-mail contains information for the intended recipient only. It may
>> contain proprietary material or confidential information. If you are not
>> the intended recipient you are not authorised to distribute, copy or use
>> this e-mail or any attachment to it. Murex cannot guarantee that it is
>> virus free and accepts no responsibility for any loss or damage arising
>> from its use. If you have received this e-mail in error please notify
>> immediately the sender and delete the original email received, any
>> attachments and all copies from your system.
>>
>
>
>
> --
> Nitin Pawar
>



-- 
Nitin Pawar


RE: SELECT specific data

2014-07-30 Thread CHEBARO Abdallah
I am only using Hive and hadoop, nothing more.

From: Devopam Mittra [mailto:devo...@gmail.com]
Sent: Wednesday, July 30, 2014 12:15 PM
To: user@hive.apache.org
Subject: Re: SELECT specific data

Are you using any tool to load data ? If yes, then the ETL tool will provide 
you such options.
If not, then please explore unix file processing/external table route.

On Wed, Jul 30, 2014 at 2:09 PM, CHEBARO Abdallah 
mailto:abdallah.cheb...@murex.com>> wrote:
Hello,

Thank you for your reply.

Consider we have data divided into 5 columns (col1, col2, col3, col4, col5).
So I can’t load directly col1, col3 and col5?
If I can’t do it directly, can you provide me with an alternate solution?

Thank you.

From: Nitin Pawar 
[mailto:nitinpawar...@gmail.com]
Sent: Wednesday, July 30, 2014 11:37 AM
To: user@hive.apache.org
Subject: Re: SELECT specific data

you mean just by writing query then I think no.

But if you want to read only first 3 columns of the data then it would work 
with just a single table and load data into

On Wed, Jul 30, 2014 at 1:47 PM, CHEBARO Abdallah 
mailto:abdallah.cheb...@murex.com>> wrote:
Hello,

I am interested in selecting specific data from a source and loading it to a 
table. For example, if I have 5 columns in my dataset, I want to load 3 columns 
of it. Is it possible to do it without create a second table?

Thank you

***

This e-mail contains information for the intended recipient only. It may 
contain proprietary material or confidential information. If you are not the 
intended recipient you are not authorised to distribute, copy or use this 
e-mail or any attachment to it. Murex cannot guarantee that it is virus free 
and accepts no responsibility for any loss or damage arising from its use. If 
you have received this e-mail in error please notify immediately the sender and 
delete the original email received, any attachments and all copies from your 
system.



--
Nitin Pawar

***

This e-mail contains information for the intended recipient only. It may 
contain proprietary material or confidential information. If you are not the 
intended recipient you are not authorised to distribute, copy or use this 
e-mail or any attachment to it. Murex cannot guarantee that it is virus free 
and accepts no responsibility for any loss or damage arising from its use. If 
you have received this e-mail in error please notify immediately the sender and 
delete the original email received, any attachments and all copies from your 
system.



--
Devopam Mittra
Life and Relations are not binary
***

This e-mail contains information for the intended recipient only. It may 
contain proprietary material or confidential information. If you are not the 
intended recipient you are not authorised to distribute, copy or use this 
e-mail or any attachment to it. Murex cannot guarantee that it is virus free 
and accepts no responsibility for any loss or damage arising from its use. If 
you have received this e-mail in error please notify immediately the sender and 
delete the original email received, any attachments and all copies from your 
system.


Re: SELECT specific data

2014-07-30 Thread Devopam Mittra
Are you using any tool to load data ? If yes, then the ETL tool will
provide you such options.
If not, then please explore unix file processing/external table route.


On Wed, Jul 30, 2014 at 2:09 PM, CHEBARO Abdallah <
abdallah.cheb...@murex.com> wrote:

>  Hello,
>
>
>
> Thank you for your reply.
>
>
>
> Consider we have data divided into 5 columns (col1, col2, col3, col4,
> col5).
>
> So I can’t load directly col1, col3 and col5?
>
> If I can’t do it directly, can you provide me with an alternate solution?
>
>
>
> Thank you.
>
>
>
> *From:* Nitin Pawar [mailto:nitinpawar...@gmail.com]
> *Sent:* Wednesday, July 30, 2014 11:37 AM
> *To:* user@hive.apache.org
> *Subject:* Re: SELECT specific data
>
>
>
> you mean just by writing query then I think no.
>
>
>
> But if you want to read only first 3 columns of the data then it would
> work with just a single table and load data into
>
>
>
> On Wed, Jul 30, 2014 at 1:47 PM, CHEBARO Abdallah <
> abdallah.cheb...@murex.com> wrote:
>
> Hello,
>
>
>
> I am interested in selecting specific data from a source and loading it to
> a table. For example, if I have 5 columns in my dataset, I want to load 3
> columns of it. Is it possible to do it without create a second table?
>
>
>
> Thank you
>
> ***
>
> This e-mail contains information for the intended recipient only. It may
> contain proprietary material or confidential information. If you are not
> the intended recipient you are not authorised to distribute, copy or use
> this e-mail or any attachment to it. Murex cannot guarantee that it is
> virus free and accepts no responsibility for any loss or damage arising
> from its use. If you have received this e-mail in error please notify
> immediately the sender and delete the original email received, any
> attachments and all copies from your system.
>
>
>
>
>
> --
> Nitin Pawar
>
> ***
>
> This e-mail contains information for the intended recipient only. It may
> contain proprietary material or confidential information. If you are not
> the intended recipient you are not authorised to distribute, copy or use
> this e-mail or any attachment to it. Murex cannot guarantee that it is
> virus free and accepts no responsibility for any loss or damage arising
> from its use. If you have received this e-mail in error please notify
> immediately the sender and delete the original email received, any
> attachments and all copies from your system.
>



-- 
Devopam Mittra
Life and Relations are not binary


RE: SELECT specific data

2014-07-30 Thread CHEBARO Abdallah
Hello,

Thank you for your reply.

Consider we have data divided into 5 columns (col1, col2, col3, col4, col5).
So I can’t load directly col1, col3 and col5?
If I can’t do it directly, can you provide me with an alternate solution?

Thank you.

From: Nitin Pawar [mailto:nitinpawar...@gmail.com]
Sent: Wednesday, July 30, 2014 11:37 AM
To: user@hive.apache.org
Subject: Re: SELECT specific data

you mean just by writing query then I think no.

But if you want to read only first 3 columns of the data then it would work 
with just a single table and load data into

On Wed, Jul 30, 2014 at 1:47 PM, CHEBARO Abdallah 
mailto:abdallah.cheb...@murex.com>> wrote:
Hello,

I am interested in selecting specific data from a source and loading it to a 
table. For example, if I have 5 columns in my dataset, I want to load 3 columns 
of it. Is it possible to do it without create a second table?

Thank you

***

This e-mail contains information for the intended recipient only. It may 
contain proprietary material or confidential information. If you are not the 
intended recipient you are not authorised to distribute, copy or use this 
e-mail or any attachment to it. Murex cannot guarantee that it is virus free 
and accepts no responsibility for any loss or damage arising from its use. If 
you have received this e-mail in error please notify immediately the sender and 
delete the original email received, any attachments and all copies from your 
system.



--
Nitin Pawar
***

This e-mail contains information for the intended recipient only. It may 
contain proprietary material or confidential information. If you are not the 
intended recipient you are not authorised to distribute, copy or use this 
e-mail or any attachment to it. Murex cannot guarantee that it is virus free 
and accepts no responsibility for any loss or damage arising from its use. If 
you have received this e-mail in error please notify immediately the sender and 
delete the original email received, any attachments and all copies from your 
system.


Re: SELECT specific data

2014-07-30 Thread Nitin Pawar
you mean just by writing query then I think no.

But if you want to read only first 3 columns of the data then it would work
with just a single table and load data into


On Wed, Jul 30, 2014 at 1:47 PM, CHEBARO Abdallah <
abdallah.cheb...@murex.com> wrote:

>  Hello,
>
>
>
> I am interested in selecting specific data from a source and loading it to
> a table. For example, if I have 5 columns in my dataset, I want to load 3
> columns of it. Is it possible to do it without create a second table?
>
>
>
> Thank you
>
> ***
>
> This e-mail contains information for the intended recipient only. It may
> contain proprietary material or confidential information. If you are not
> the intended recipient you are not authorised to distribute, copy or use
> this e-mail or any attachment to it. Murex cannot guarantee that it is
> virus free and accepts no responsibility for any loss or damage arising
> from its use. If you have received this e-mail in error please notify
> immediately the sender and delete the original email received, any
> attachments and all copies from your system.
>



-- 
Nitin Pawar


SELECT specific data

2014-07-30 Thread CHEBARO Abdallah
Hello,

I am interested in selecting specific data from a source and loading it to a 
table. For example, if I have 5 columns in my dataset, I want to load 3 columns 
of it. Is it possible to do it without create a second table?

Thank you
***

This e-mail contains information for the intended recipient only. It may 
contain proprietary material or confidential information. If you are not the 
intended recipient you are not authorised to distribute, copy or use this 
e-mail or any attachment to it. Murex cannot guarantee that it is virus free 
and accepts no responsibility for any loss or damage arising from its use. If 
you have received this e-mail in error please notify immediately the sender and 
delete the original email received, any attachments and all copies from your 
system.