Re: Hive_CSV

2016-03-09 Thread Jörn Franke
The data is already in the csv so it is not matter for querying. It is 
recommend to convert it to ORC or Parquet for querying.

> On 09 Mar 2016, at 19:09, Ajay Chander  wrote:
> 
> Daniel, thanks for your time. Is it like creating two tables, one is to get 
> all the data and the another one is to fetch the required data out of it? If 
> that is the case I was just concerned of redundant data. Please correct me if 
> I am wrong. Thanks 
> 
>> On Wednesday, March 9, 2016, Daniel Haviv  
>> wrote:
>> Hi Ajay,
>> Use the CSV serde to read your file, map all three columns but only select 
>> the relevant ones when you insert:
>> 
>> Create table csvtab (
>> irrelevant string,
>> sportName string,
>> sportType string) ...
>> 
>> Insert into loaded_table select sportName, sportType from csvtab;
>> 
>> Daniel
>> 
>> > On 9 Mar 2016, at 19:43, Ajay Chander  wrote:
>> >
>> > Hi Everyone,
>> >
>> > I am looking for a way, to ignore the first occurrence of the delimiter 
>> > while loading the data from csv file to hive external table.
>> >
>> > Csv file:
>> >
>> > Xyz, baseball, outdoor
>> >
>> > Hive table has two columns sport_name & sport_type and fields are 
>> > separated by ','
>> >
>> > Now I want to load by data into table such that while loading it has to 
>> > ignore the first delimiter that ignore xyz and load the data from second 
>> > delimiter.
>> >
>> > In the end my hive table should have the following data,
>> >
>> > Baseball, outdoor .
>> >
>> > Any inputs are appreciated. Thank you for your time.


Re: Hive_CSV

2016-03-09 Thread Ajay Chander
Daniel, thanks for your time. Is it like creating two tables, one is to get
all the data and the another one is to fetch the required data out of it?
If that is the case I was just concerned of redundant data. Please correct
me if I am wrong. Thanks

On Wednesday, March 9, 2016, Daniel Haviv 
wrote:

> Hi Ajay,
> Use the CSV serde to read your file, map all three columns but only select
> the relevant ones when you insert:
>
> Create table csvtab (
> irrelevant string,
> sportName string,
> sportType string) ...
>
> Insert into loaded_table select sportName, sportType from csvtab;
>
> Daniel
>
> > On 9 Mar 2016, at 19:43, Ajay Chander  > wrote:
> >
> > Hi Everyone,
> >
> > I am looking for a way, to ignore the first occurrence of the delimiter
> while loading the data from csv file to hive external table.
> >
> > Csv file:
> >
> > Xyz, baseball, outdoor
> >
> > Hive table has two columns sport_name & sport_type and fields are
> separated by ','
> >
> > Now I want to load by data into table such that while loading it has to
> ignore the first delimiter that ignore xyz and load the data from second
> delimiter.
> >
> > In the end my hive table should have the following data,
> >
> > Baseball, outdoor .
> >
> > Any inputs are appreciated. Thank you for your time.
>


Re: Hive_CSV

2016-03-09 Thread Ajay Chander
Jorn, thanks for your time. The reason I wanted to do so is, I don't want
to bring the unnecessary data into the table. Each record is carrying a
unnecessary value.

On Wednesday, March 9, 2016, Jörn Franke  wrote:

>
> Why Don't you load all data and use just two columns for querying?
> Alternatively use regular expressions.
>
>
>
> > On 09 Mar 2016, at 18:43, Ajay Chander  > wrote:
> >
> > Hi Everyone,
> >
> > I am looking for a way, to ignore the first occurrence of the delimiter
> while loading the data from csv file to hive external table.
> >
> > Csv file:
> >
> > Xyz, baseball, outdoor
> >
> > Hive table has two columns sport_name & sport_type and fields are
> separated by ','
> >
> > Now I want to load by data into table such that while loading it has to
> ignore the first delimiter that ignore xyz and load the data from second
> delimiter.
> >
> > In the end my hive table should have the following data,
> >
> > Baseball, outdoor .
> >
> > Any inputs are appreciated. Thank you for your time.
>


Re: Hive_CSV

2016-03-09 Thread Daniel Haviv
Hi Ajay,
Use the CSV serde to read your file, map all three columns but only select the 
relevant ones when you insert:

Create table csvtab (
irrelevant string,
sportName string,
sportType string) ...

Insert into loaded_table select sportName, sportType from csvtab;

Daniel

> On 9 Mar 2016, at 19:43, Ajay Chander  wrote:
> 
> Hi Everyone,
> 
> I am looking for a way, to ignore the first occurrence of the delimiter while 
> loading the data from csv file to hive external table.
> 
> Csv file: 
> 
> Xyz, baseball, outdoor
> 
> Hive table has two columns sport_name & sport_type and fields are separated 
> by ','
> 
> Now I want to load by data into table such that while loading it has to 
> ignore the first delimiter that ignore xyz and load the data from second 
> delimiter.
> 
> In the end my hive table should have the following data,
> 
> Baseball, outdoor .
> 
> Any inputs are appreciated. Thank you for your time.


Re: Hive_CSV

2016-03-09 Thread Jörn Franke

Why Don't you load all data and use just two columns for querying? 
Alternatively use regular expressions.



> On 09 Mar 2016, at 18:43, Ajay Chander  wrote:
> 
> Hi Everyone,
> 
> I am looking for a way, to ignore the first occurrence of the delimiter while 
> loading the data from csv file to hive external table.
> 
> Csv file: 
> 
> Xyz, baseball, outdoor
> 
> Hive table has two columns sport_name & sport_type and fields are separated 
> by ','
> 
> Now I want to load by data into table such that while loading it has to 
> ignore the first delimiter that ignore xyz and load the data from second 
> delimiter.
> 
> In the end my hive table should have the following data,
> 
> Baseball, outdoor .
> 
> Any inputs are appreciated. Thank you for your time.