Re: May 2018 Hive User Group Meeting

2018-05-02 Thread Ajay Chander
+1 for streaming or a recording. Thanks

On Wed, May 2, 2018 at 10:54 AM Elliot West  wrote:

> +1 for streaming or a recording. Content looks excellent.
>
> On 2 May 2018 at 15:51, dan young  wrote:
>
>> Looks like great talks, will this be streamed anywhere?
>>
>> On Wed, May 2, 2018, 8:48 AM Sahil Takiar  wrote:
>>
>>> Hey Everyone,
>>>
>>> The agenda for the meetup has been set and I'm excited to say we have
>>> lots of interesting talks scheduled! Below is final agenda, the full list
>>> of abstracts will be sent out soon. If you are planning to attend, please
>>> RSVP on the meetup link so we can get an accurate headcount of attendees (
>>> https://www.meetup.com/Hive-User-Group-Meeting/events/249641278/).
>>>
>>> 6:30 - 7:00 PM Networking and Refreshments
>>> 7:00PM - 8:20 PM Lightning Talks (10 min each) - 8 talks total
>>>
>>>- What's new in Hive 3.0.0 - Ashutosh Chauhan
>>>- Hive-on-Spark at Uber: Efficiency & Scale - Xuefu Zhang
>>>- Hive-on-S3 Performance: Past, Present, and Future - Sahil Takiar
>>>- Dali: Data Access Layer at LinkedIn - Adwait Tumbde
>>>- Parquet Vectorization in Hive - Vihang Karajgaonkar
>>>- ORC Column Level Encryption - Owen O’Malley
>>>- Running Hive at Scale @ Lyft - Sharanya Santhanam, Rohit Menon
>>>- Materialized Views in Hive - Jesus Camacho Rodriguez
>>>
>>> 8:30 PM - 9:00 PM Hive Metastore Panel
>>>
>>>- Moderator: Vihang Karajgaonkar
>>>- Participants:
>>>   - Daniel Dai - Hive Metastore Caching
>>>   - Alan Gates - Hive Metastore Separation
>>>   - Rituparna Agrawal - Customer Use Cases & Pain Points of (Big)
>>>   Metadata
>>>
>>> The Metastore panel will consist of a short presentation by each
>>> panelist followed by a Q&A session driven by the moderator.
>>>
>>> On Tue, Apr 24, 2018 at 2:53 PM, Sahil Takiar 
>>> wrote:
>>>
 We still have a few slots open for lightening talks, so if anyone is
 interested in giving a presentation don't hesitate to reach out!

 If you are planning to attend the meetup, please RSVP on the Meetup
 link (https://www.meetup.com/Hive-User-Group-Meeting/events/249641278/)
 so that we can get an accurate headcount for food.

 Thanks!

 --Sahil

 On Wed, Apr 11, 2018 at 5:08 PM, Sahil Takiar 
 wrote:

> Hi all,
>
> I'm happy to announce that the Hive community is organizing a Hive
> user group meeting in the Bay Area next month. The details can be found at
> https://www.meetup.com/Hive-User-Group-Meeting/events/249641278/
>
> The format of this meetup will be slightly different from previous
> ones. There will be one hour dedicated to lightning talks, followed by a
> group discussion on the future of the Hive Metastore.
>
> We are inviting talk proposals from Hive users as well as developers
> at this time. Please contact either myself (takiar.sa...@gmail.com),
> Vihang Karajgaonkar (vih...@cloudera.com), or Peter Vary (
> pv...@cloudera.com) with proposals. We currently have 5 openings.
>
> Please let me know if you have any questions or suggestions.
>
> Thanks,
> Sahil
>



 --
 Sahil Takiar
 Software Engineer
 takiar.sa...@gmail.com | (510) 673-0309

>>>
>>>
>>>
>>> --
>>> Sahil Takiar
>>> Software Engineer
>>> takiar.sa...@gmail.com | (510) 673-0309
>>>
>>
>


Re: HIVE on Windows

2016-08-24 Thread Ajay Chander
Hi,

Were you able to get Hive up and running on Windows machine ? I have
installed Hadoop on Windows now I want to install Hive too. I couldn't find
binaries to run on Windows machine. Can anyone tell me is it possible to
run Hive on Windows machine ? Thanks

On Wednesday, May 18, 2016, Me To  wrote:

> Thanks so  much for replying:)
>
> so without distribution, I will not able to do that?
>
> On Wed, May 18, 2016 at 12:27 PM, Jörn Franke  > wrote:
>
>> Use a distribution, such as Hortonworks
>>
>>
>> On 18 May 2016, at 19:09, Me To > > wrote:
>>
>> Hello,
>>
>> I want to install hive on my windows machine but I am unable to find any
>> resource out there. I am trying to set up it from one month but unable to
>> accomplish that. I have successfully set up Hadoop on my windows machine.
>> According to this guide
>>
>> https://cwiki.apache.org/confluence/display/Hive/AdminManual+Installation
>>
>> There are different steps involved to install and run it on Windows but
>> where are those steps documented? Please help me with this problem. I have
>> posted this question in almost all forums like Stackoverflow but nobody
>> knows the answer.
>>
>> I am using Windows 8 and Hadoop2.7 running on my desktop. I want to run
>> hive and beeline. Please help me.
>>
>> Looking forward for response.
>>
>> Thank you.
>> Ekta Paliwal
>>
>>
>


ELK_on_Hive

2016-08-15 Thread Ajay Chander
Hi Team,

I would like to get your opinion on implementing ELK on Hive. Right now, we
have some tables in Hive and we would like to visualize those
tables/generate a reports. Is ELK known to do this kind of stuff or what
would you recommend ? Thanks

-
Aj


Re: External_Tables_Disadvantages

2016-06-28 Thread Ajay Chander
Hi Team, Any insights on this one? Thank you

On Monday, June 27, 2016, Ajay Chander  wrote:

> Hi Everyone,
>
> I would like to know the disadvantages of using External tables in Hive. I
> was told that "Managing security with sentry will be very limited for
> external tables" is it true? Can someone explain it please? Thank you.
>
> Regards,
> Aj
>


External_Tables_Disadvantages

2016-06-27 Thread Ajay Chander
Hi Everyone,

I would like to know the disadvantages of using External tables in Hive. I
was told that "Managing security with sentry will be very limited for
external tables" is it true? Can someone explain it please? Thank you.

Regards,
Aj


Re: Sqoop_Sql_blob_types

2016-04-27 Thread Ajay Chander
Thanks Franke! Probably now I don't want to move the data directly into
Hive. My SQL database contains a table 'test' with 2 Columns(file_name
char(100) ,file_data longblob). Column 'file_data' may contain xml
formatted data or pipe delimited data and its huge amount of data. Right
now I am considering to load the data into HDFS under some directory
structure using sqoop. I just want to make sure if there is any possibility
that I may encounter data loss ? Or any best practices that needs to be
followed. Thanks for your time.

On Wednesday, April 27, 2016, Jörn Franke  wrote:

> You could try as binary. Is it just for storing the blobs or for doing
> analyzes on them? In the first case you may think about storing them as
> files in HDFS and including in hive just a string containing the file name
> (to make analysis on the other data faster). In the later case you should
> think about an optimal analysis format in Hive.
>
> > On 27 Apr 2016, at 22:13, Ajay Chander  > wrote:
> >
> > Hi Everyone,
> >
> > I have a table which has few columns as blob types with huge data. Is
> there any best way to 'sqoop import' it to hive tables with out losing any
> data ? Any help is highly appreciated.
> >
> > Thank you!
>


Re: Sqoop_Sql_blob_types

2016-04-27 Thread Ajay Chander
Mich thanks for looking into this. At this point of time the source is
MySQL.

Thank you!

On Wednesday, April 27, 2016, Mich Talebzadeh 
wrote:

> Is the source of data Oracle?
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 27 April 2016 at 21:13, Ajay Chander  > wrote:
>
>> Hi Everyone,
>>
>> I have a table which has few columns as blob types with huge data. Is
>> there any best way to 'sqoop import' it to hive tables with out losing any
>> data ? Any help is highly appreciated.
>>
>> Thank you!
>>
>
>


Sqoop_Sql_blob_types

2016-04-27 Thread Ajay Chander
Hi Everyone,

I have a table which has few columns as blob types with huge data. Is there
any best way to 'sqoop import' it to hive tables with out losing any data ?
Any help is highly appreciated.

Thank you!


Re: Data_encyption(rdbms_to_hive)

2016-04-19 Thread Ajay Chander
This is to understand if sqoop has the capability to encrypt the data in
transit from Man in the Middle attacks? Any pointers are appreciated.
Thanks

On Tuesday, April 19, 2016, Ajay Chander  wrote:

> Hi Everyone,
>
> I am just trying to understand if there is any  default 'data
> encryption/decryption' involved when we sqoop the data from rdbms to hive.
> If there is so, can someone point me to a material which reads about it?
> Thanks for your time!
>
> Regards,
> Aj
>


Data_encyption(rdbms_to_hive)

2016-04-19 Thread Ajay Chander
Hi Everyone,

I am just trying to understand if there is any  default 'data
encryption/decryption' involved when we sqoop the data from rdbms to hive.
If there is so, can someone point me to a material which reads about it?
Thanks for your time!

Regards,
Aj


Hive support in oozie shell action

2016-04-06 Thread Ajay Chander
Hi Everyone,

I am trying to execute a hive script in oozie shell action like below,




Action failed, error
message[${wf:errorMessage(wf:lastErrorNode())}]



${jobTracker}
${nameNode}
/user/hue/oozie/workspaces/hue-oozie-1459982183.04/b.sh
/user/hue/oozie/workspaces/hue-oozie-1459982183.04/b.sh#b.sh









b.sh:

#!/bin/bash

hive -e "LOAD DATA INPATH '/user/test/landing_zone/file1 _*' INTO TABLE
mydb.Test"

It throws 'Launcher Error reason main class
[org.apache.oozie.action.hadoop.ShellMain] exit code [1] '

Any pointers ? Thanks for your time.


Re: Hive_primary_key

2016-03-30 Thread Ajay Chander
Thank you for the quick reply. I will track it there.

On Wednesday, March 30, 2016, Hari Sivarama Subramaniyan <
hsubramani...@hortonworks.com> wrote:

> This is an on-going work, can be tracked under the tasks in
> https://issues.apache.org/jira/browse/HIVE-13076​
>
>
> Thanks
>
> Hari
> ----------
> *From:* Ajay Chander  >
> *Sent:* Wednesday, March 30, 2016 11:33 AM
> *To:* user@hive.apache.org
> 
> *Subject:* Hive_primary_key
>
> Hi Users,
>
> Just wanted to check if the support for defining primary
> keys(constraints) while creating a hive external tables is available yet?
>
> Thank you!
>


Hive_primary_key

2016-03-30 Thread Ajay Chander
Hi Users,

Just wanted to check if the support for defining primary
keys(constraints) while creating a hive external tables is available yet?

Thank you!


Re: De-identification_in Hive

2016-03-20 Thread Ajay Chander
Thanks for your time Mich! I will try this one out.

On Thursday, March 17, 2016, Mich Talebzadeh 
wrote:

> Then probably the easiest option would be in INSERT/SELECT from external
> table to target table and make that column NULL
>
> Check the VAT column here that I made it NULL
>
> DROP TABLE IF EXISTS stg_t2;
> CREATE EXTERNAL TABLE stg_t2 (
>  INVOICENUMBER string
> ,PAYMENTDATE string
> ,NET string
> ,VAT string
> ,TOTAL string
> )
> COMMENT 'from csv file from excel sheet '
> ROW FORMAT serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
> STORED AS TEXTFILE
> LOCATION '/data/stg/table2'
> TBLPROPERTIES ("skip.header.line.count"="1")
> ;
> --3)
> DROP TABLE IF EXISTS t2;
> CREATE TABLE t2 (
>  INVOICENUMBER  INT
> ,PAYMENTDATEtimestamp
> ,NETDECIMAL(20,2)
> ,VATDECIMAL(20,2)
> ,TOTAL  DECIMAL(20,2)
> )
> COMMENT 'from csv file from excel sheet '
> CLUSTERED BY (INVOICENUMBER) INTO 256 BUCKETS
> STORED AS ORC
> TBLPROPERTIES ( "orc.compress"="ZLIB",
> "transactional"="true")
> ;
> --4) Put data in target table. do the conversion and ignore empty rows
> INSERT INTO TABLE t2
> SELECT
>   INVOICENUMBER
> , CAST(UNIX_TIMESTAMP(paymentdate,'DD/MM/')*1000 as timestamp)
> , CAST(REGEXP_REPLACE(net,'[^\\d\\.]','') AS DECIMAL(20,2))
> , NULL
> , CAST(REGEXP_REPLACE(total,'[^\\d\\.]','') AS DECIMAL(20,2))
> FROM
> stg_t2
> WHERE
> --INVOICENUMBER > 0 AND
> CAST(REGEXP_REPLACE(total,'[^\\d\\.]','') AS DECIMAL(20,2)) > 0.0
> -- Exclude empty rows
>
> HTH
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 17 March 2016 at 15:32, Ajay Chander  > wrote:
>
>> Mich, I am okay with replacing the columns data with some characters
>> like asterisk. Thanks
>>
>>
>> On Thursday, March 17, 2016, Mich Talebzadeh > > wrote:
>>
>>> Hi Ajay,
>>>
>>> Do you want to be able to unmask it (at any time) or just have it
>>> totally scrambled (for example replace the column with random characters)
>>> in Hive?
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * 
>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>>
>>> On 17 March 2016 at 15:14, Ajay Chander  wrote:
>>>
>>>> Mich thbaks for looking into this. I have a 'csvfile.txt ' on hdfs. I
>>>> have created an external table 'xyz' to load that data into it. One of the
>>>> columns data 'ssn' needs to be masked. Is there any built in function is
>>>> give that I could use?
>>>>
>>>>
>>>> On Thursday, March 17, 2016, Mich Talebzadeh 
>>>> wrote:
>>>>
>>>>> Are you loading your CSV file from an External table into Hive table.?
>>>>>
>>>>> Basically you want to scramble that column before putting into Hive
>>>>> table?
>>>>>
>>>>> Dr Mich Talebzadeh
>>>>>
>>>>>
>>>>>
>>>>> LinkedIn * 
>>>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>
>>>>>
>>>>>
>>>>> http://talebzadehmich.wordpress.com
>>>>>
>>>>>
>>>>>
>>>>> On 17 March 2016 at 14:37, Ajay Chander  wrote:
>>>>>
>>>>>> Tustin, Is there anyway I can deidentify it in hive ?
>>>>>>
>>>>>>
>>>>>> On Thursday, March 17, 2016, Marcin Tustin 
>>>>>> wrote:
>>>>>>
>>>>>>> This is a classic transform-load problem. You'll want to anonymise
>>>>>>> it once before making it available for analysis.
>>>>>>>
>>>>>>> On Thursday, March 17, 2016, Ajay Chander 
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Everyone,
>>>>>>>>
>>>>>>>> I have a csv.file which has some sensitive data in a particular
>>>>>>>> column in it.  Now I have to create a table in hive and load the data 
>>>>>>>> into
>>>>>>>> it. But when loading the data I have to make sure that the data is 
>>>>>>>> masked.
>>>>>>>> Is there any built in function is used ch supports this or do I have to
>>>>>>>> write UDF ? Any suggestions are appreciated. Thanks
>>>>>>>
>>>>>>>
>>>>>>> Want to work at Handy? Check out our culture deck and open roles
>>>>>>> <http://www.handy.com/careers>
>>>>>>> Latest news <http://www.handy.com/press> at Handy
>>>>>>> Handy just raised $50m
>>>>>>> <http://venturebeat.com/2015/11/02/on-demand-home-service-handy-raises-50m-in-round-led-by-fidelity/>
>>>>>>>  led
>>>>>>> by Fidelity
>>>>>>>
>>>>>>>
>>>>>
>>>
>


De-identification_in Hive

2016-03-19 Thread Ajay Chander
Hi Everyone,

I have a csv.file which has some sensitive data in a particular column
in it.  Now I have to create a table in hive and load the data into it. But
when loading the data I have to make sure that the data is masked. Is there
any built in function is used ch supports this or do I have to write UDF ?
Any suggestions are appreciated. Thanks


Re: De-identification_in Hive

2016-03-19 Thread Ajay Chander
Tustin, Is there anyway I can deidentify it in hive ?

On Thursday, March 17, 2016, Marcin Tustin  wrote:

> This is a classic transform-load problem. You'll want to anonymise it once
> before making it available for analysis.
>
> On Thursday, March 17, 2016, Ajay Chander  > wrote:
>
>> Hi Everyone,
>>
>> I have a csv.file which has some sensitive data in a particular column
>> in it.  Now I have to create a table in hive and load the data into it. But
>> when loading the data I have to make sure that the data is masked. Is there
>> any built in function is used ch supports this or do I have to write UDF ?
>> Any suggestions are appreciated. Thanks
>
>
> Want to work at Handy? Check out our culture deck and open roles
> <http://www.handy.com/careers>
> Latest news <http://www.handy.com/press> at Handy
> Handy just raised $50m
> <http://venturebeat.com/2015/11/02/on-demand-home-service-handy-raises-50m-in-round-led-by-fidelity/>
>  led
> by Fidelity
>
>


Re: De-identification_in Hive

2016-03-19 Thread Ajay Chander
Mich, I am okay with replacing the columns data with some characters like
asterisk. Thanks

On Thursday, March 17, 2016, Mich Talebzadeh 
wrote:

> Hi Ajay,
>
> Do you want to be able to unmask it (at any time) or just have it totally
> scrambled (for example replace the column with random characters) in Hive?
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 17 March 2016 at 15:14, Ajay Chander  > wrote:
>
>> Mich thbaks for looking into this. I have a 'csvfile.txt ' on hdfs. I
>> have created an external table 'xyz' to load that data into it. One of the
>> columns data 'ssn' needs to be masked. Is there any built in function is
>> give that I could use?
>>
>>
>> On Thursday, March 17, 2016, Mich Talebzadeh > > wrote:
>>
>>> Are you loading your CSV file from an External table into Hive table.?
>>>
>>> Basically you want to scramble that column before putting into Hive
>>> table?
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * 
>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>>
>>> On 17 March 2016 at 14:37, Ajay Chander  wrote:
>>>
>>>> Tustin, Is there anyway I can deidentify it in hive ?
>>>>
>>>>
>>>> On Thursday, March 17, 2016, Marcin Tustin 
>>>> wrote:
>>>>
>>>>> This is a classic transform-load problem. You'll want to anonymise it
>>>>> once before making it available for analysis.
>>>>>
>>>>> On Thursday, March 17, 2016, Ajay Chander 
>>>>> wrote:
>>>>>
>>>>>> Hi Everyone,
>>>>>>
>>>>>> I have a csv.file which has some sensitive data in a particular
>>>>>> column in it.  Now I have to create a table in hive and load the data 
>>>>>> into
>>>>>> it. But when loading the data I have to make sure that the data is 
>>>>>> masked.
>>>>>> Is there any built in function is used ch supports this or do I have to
>>>>>> write UDF ? Any suggestions are appreciated. Thanks
>>>>>
>>>>>
>>>>> Want to work at Handy? Check out our culture deck and open roles
>>>>> <http://www.handy.com/careers>
>>>>> Latest news <http://www.handy.com/press> at Handy
>>>>> Handy just raised $50m
>>>>> <http://venturebeat.com/2015/11/02/on-demand-home-service-handy-raises-50m-in-round-led-by-fidelity/>
>>>>>  led
>>>>> by Fidelity
>>>>>
>>>>>
>>>
>


Re: De-identification_in Hive

2016-03-19 Thread Ajay Chander
Mich thbaks for looking into this. I have a 'csvfile.txt ' on hdfs. I have
created an external table 'xyz' to load that data into it. One of the
columns data 'ssn' needs to be masked. Is there any built in function is
give that I could use?

On Thursday, March 17, 2016, Mich Talebzadeh 
wrote:

> Are you loading your CSV file from an External table into Hive table.?
>
> Basically you want to scramble that column before putting into Hive table?
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 17 March 2016 at 14:37, Ajay Chander  > wrote:
>
>> Tustin, Is there anyway I can deidentify it in hive ?
>>
>>
>> On Thursday, March 17, 2016, Marcin Tustin > > wrote:
>>
>>> This is a classic transform-load problem. You'll want to anonymise it
>>> once before making it available for analysis.
>>>
>>> On Thursday, March 17, 2016, Ajay Chander  wrote:
>>>
>>>> Hi Everyone,
>>>>
>>>> I have a csv.file which has some sensitive data in a particular column
>>>> in it.  Now I have to create a table in hive and load the data into it. But
>>>> when loading the data I have to make sure that the data is masked. Is there
>>>> any built in function is used ch supports this or do I have to write UDF ?
>>>> Any suggestions are appreciated. Thanks
>>>
>>>
>>> Want to work at Handy? Check out our culture deck and open roles
>>> <http://www.handy.com/careers>
>>> Latest news <http://www.handy.com/press> at Handy
>>> Handy just raised $50m
>>> <http://venturebeat.com/2015/11/02/on-demand-home-service-handy-raises-50m-in-round-led-by-fidelity/>
>>>  led
>>> by Fidelity
>>>
>>>
>


Re: De-identification_in Hive

2016-03-19 Thread Ajay Chander
Jorne, I have around hundred big csv files in my local machine. Each
file has some number of columns which has sensitive information in it. I
don't want to drop the columns manually.

Now I have to bring those files into hive external tables, but I want to
make sure that the columns which has sensitive information must be masked.
Is there any way of doing it? Thanks for your time!

On Thursday, March 17, 2016, Jörn Franke  wrote:

> What are your requirements? Do you need to omit a column? Transform it?
> Make the anonymized version joinable etc. there is not simply one function.
>
> > On 17 Mar 2016, at 14:58, Ajay Chander  > wrote:
> >
> > Hi Everyone,
> >
> > I have a csv.file which has some sensitive data in a particular column
> in it.  Now I have to create a table in hive and load the data into it. But
> when loading the data I have to make sure that the data is masked. Is there
> any built in function is used ch supports this or do I have to write UDF ?
> Any suggestions are appreciated. Thanks
>


Re: Hive_CSV

2016-03-09 Thread Ajay Chander
Daniel, thanks for your time. Is it like creating two tables, one is to get
all the data and the another one is to fetch the required data out of it?
If that is the case I was just concerned of redundant data. Please correct
me if I am wrong. Thanks

On Wednesday, March 9, 2016, Daniel Haviv 
wrote:

> Hi Ajay,
> Use the CSV serde to read your file, map all three columns but only select
> the relevant ones when you insert:
>
> Create table csvtab (
> irrelevant string,
> sportName string,
> sportType string) ...
>
> Insert into loaded_table select sportName, sportType from csvtab;
>
> Daniel
>
> > On 9 Mar 2016, at 19:43, Ajay Chander  > wrote:
> >
> > Hi Everyone,
> >
> > I am looking for a way, to ignore the first occurrence of the delimiter
> while loading the data from csv file to hive external table.
> >
> > Csv file:
> >
> > Xyz, baseball, outdoor
> >
> > Hive table has two columns sport_name & sport_type and fields are
> separated by ','
> >
> > Now I want to load by data into table such that while loading it has to
> ignore the first delimiter that ignore xyz and load the data from second
> delimiter.
> >
> > In the end my hive table should have the following data,
> >
> > Baseball, outdoor .
> >
> > Any inputs are appreciated. Thank you for your time.
>


Re: Hive_CSV

2016-03-09 Thread Ajay Chander
Jorn, thanks for your time. The reason I wanted to do so is, I don't want
to bring the unnecessary data into the table. Each record is carrying a
unnecessary value.

On Wednesday, March 9, 2016, Jörn Franke  wrote:

>
> Why Don't you load all data and use just two columns for querying?
> Alternatively use regular expressions.
>
>
>
> > On 09 Mar 2016, at 18:43, Ajay Chander  > wrote:
> >
> > Hi Everyone,
> >
> > I am looking for a way, to ignore the first occurrence of the delimiter
> while loading the data from csv file to hive external table.
> >
> > Csv file:
> >
> > Xyz, baseball, outdoor
> >
> > Hive table has two columns sport_name & sport_type and fields are
> separated by ','
> >
> > Now I want to load by data into table such that while loading it has to
> ignore the first delimiter that ignore xyz and load the data from second
> delimiter.
> >
> > In the end my hive table should have the following data,
> >
> > Baseball, outdoor .
> >
> > Any inputs are appreciated. Thank you for your time.
>


Hive_CSV

2016-03-09 Thread Ajay Chander
Hi Everyone,

I am looking for a way, to ignore the first occurrence of the delimiter
while loading the data from csv file to hive external table.

Csv file:

Xyz, baseball, outdoor

Hive table has two columns sport_name & sport_type and fields are separated
by ','

Now I want to load by data into table such that while loading it has to
ignore the first delimiter that ignore xyz and load the data from second
delimiter.

In the end my hive table should have the following data,

Baseball, outdoor .

Any inputs are appreciated. Thank you for your time.