Re: May 2018 Hive User Group Meeting

2018-05-02 Thread Sahil Takiar
Hey Everyone,

Yes we plan to stream and record the meetup. More details on how to access
the stream / recordings to come.

--Sahil

On Wed, May 2, 2018 at 9:55 AM, Ajay Chander  wrote:

> +1 for streaming or a recording. Thanks
>
> On Wed, May 2, 2018 at 10:54 AM Elliot West  wrote:
>
>> +1 for streaming or a recording. Content looks excellent.
>>
>> On 2 May 2018 at 15:51, dan young  wrote:
>>
>>> Looks like great talks, will this be streamed anywhere?
>>>
>>> On Wed, May 2, 2018, 8:48 AM Sahil Takiar 
>>> wrote:
>>>
 Hey Everyone,

 The agenda for the meetup has been set and I'm excited to say we have
 lots of interesting talks scheduled! Below is final agenda, the full list
 of abstracts will be sent out soon. If you are planning to attend, please
 RSVP on the meetup link so we can get an accurate headcount of attendees (
 https://www.meetup.com/Hive-User-Group-Meeting/events/249641278/).

 6:30 - 7:00 PM Networking and Refreshments
 7:00PM - 8:20 PM Lightning Talks (10 min each) - 8 talks total

- What's new in Hive 3.0.0 - Ashutosh Chauhan
- Hive-on-Spark at Uber: Efficiency & Scale - Xuefu Zhang
- Hive-on-S3 Performance: Past, Present, and Future - Sahil Takiar
- Dali: Data Access Layer at LinkedIn - Adwait Tumbde
- Parquet Vectorization in Hive - Vihang Karajgaonkar
- ORC Column Level Encryption - Owen O’Malley
- Running Hive at Scale @ Lyft - Sharanya Santhanam, Rohit Menon
- Materialized Views in Hive - Jesus Camacho Rodriguez

 8:30 PM - 9:00 PM Hive Metastore Panel

- Moderator: Vihang Karajgaonkar
- Participants:
   - Daniel Dai - Hive Metastore Caching
   - Alan Gates - Hive Metastore Separation
   - Rituparna Agrawal - Customer Use Cases & Pain Points of (Big)
   Metadata

 The Metastore panel will consist of a short presentation by each
 panelist followed by a Q&A session driven by the moderator.

 On Tue, Apr 24, 2018 at 2:53 PM, Sahil Takiar 
 wrote:

> We still have a few slots open for lightening talks, so if anyone is
> interested in giving a presentation don't hesitate to reach out!
>
> If you are planning to attend the meetup, please RSVP on the Meetup
> link (https://www.meetup.com/Hive-User-Group-Meeting/events/249641278/)
> so that we can get an accurate headcount for food.
>
> Thanks!
>
> --Sahil
>
> On Wed, Apr 11, 2018 at 5:08 PM, Sahil Takiar 
> wrote:
>
>> Hi all,
>>
>> I'm happy to announce that the Hive community is organizing a Hive
>> user group meeting in the Bay Area next month. The details can be found 
>> at
>> https://www.meetup.com/Hive-User-Group-Meeting/events/249641278/
>>
>> The format of this meetup will be slightly different from previous
>> ones. There will be one hour dedicated to lightning talks, followed by a
>> group discussion on the future of the Hive Metastore.
>>
>> We are inviting talk proposals from Hive users as well as developers
>> at this time. Please contact either myself (takiar.sa...@gmail.com),
>> Vihang Karajgaonkar (vih...@cloudera.com), or Peter Vary (
>> pv...@cloudera.com) with proposals. We currently have 5 openings.
>>
>> Please let me know if you have any questions or suggestions.
>>
>> Thanks,
>> Sahil
>>
>
>
>
> --
> Sahil Takiar
> Software Engineer
> takiar.sa...@gmail.com | (510) 673-0309
>



 --
 Sahil Takiar
 Software Engineer
 takiar.sa...@gmail.com | (510) 673-0309

>>>
>>


-- 
Sahil Takiar
Software Engineer
takiar.sa...@gmail.com | (510) 673-0309


Re: Hive External Table with Zero Bytes files

2018-05-02 Thread Nishanth S
I have run into similar issue with avro files . The solution was to fix
upstream jobs  that were writing data to those directories . In our case
the  writers were not flushed/closed correctly during certain  events which
caused the   issue . Fixing those prevented these 0 sized files.

-NS

On Wed, May 2, 2018 at 1:52 AM, Mahender Sarangam <
mahender.bigd...@outlook.com> wrote:

> ping..
>
> On 5/1/2018 3:57 AM, Mahender Sarangam wrote:
>
> Thanks Thai. I have mentioned wrongly Folder Name, it 's same DAY=20180325
> (Folder) and same has Filename. actually in our upstream, our source table
> is partitioned by Date. Whenever a table is partitioned, we see Zero Byte.
> Now when we create external table with partitioned by columns and fire
> select query no data is returned. . If I delete manually those files (Zero
> Bytes), we were able to read.
>
>
> /Mahender
>
> On 4/28/2018 6:36 AM, Thai Bui wrote:
>
> Your external table is referencing the .../day=201803250 location which is
> empty. Point your table to the capital .../DAY=201803250 and you should be
> able to read the data there.
>
> Also, it looks like you want external partitioned table. You’ll need to
> create an external table with a partition clause, then alter the table and
> add partition for each of the ../DAY=someday path that you have.
>
> On Sat, Apr 28, 2018 at 4:05 AM Mahender Sarangam <
> mahender.bigd...@outlook.com> wrote:
>
>> Gentle Ping. Please help me on below issue. Has any one faced same issue
>>
>> On 4/27/2018 1:28 AM, Mahender Sarangam wrote:
>>
>> Hi,
>>
>> Can any one faced issue while fetching data from external table. We are
>> copying data from upstream system into our storage S3. As part of copy,
>> directories along with Zero bytes files are been copied. Source File Format
>> is in JSON format.  Below is Folder Hierarchy Structure
>>
>>
>>  DATE  -->  
>>
>> ---> Folder
>>
>>  1.json.gz  --> File
>>
>>   2.json.gz
>>
>>  ---> Empty Zero Bytes Files.
>>
>> Please find below screenshot
>>
>> We are trying to create external table with JSON Serde.
>>
>> ADD JAR wasb://jsonse...@xyz.blob.core.windows.net/json/json-
>> serde-1.3.9.jar;
>>  SET hive.mapred.supports.subdirectories=TRUE;
>>  SET mapred.input.dir.recursive=TRUE;
>> SET hive.merge.mapfiles = true;
>> SET hive.merge.mapredfiles = true;
>> SET hive.merge.tezfiles = true;
>>
>>
>>  DROP TABLE IF EXISTS Ext_STG1;
>>  CREATE EXTERNAL TABLE Ext_STG1(Col1 String, Col2 String, Col3 String)
>> ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' WITH
>> SERDEPROPERTIES ("case.insensitive" = "true", "ignore.malformed.json" =
>> "true")
>> STORED AS TEXTFILE LOCATION 'wasb://contain...@xyz.blob.
>> core.windows.net/date/day=201803250/' TBLPROPERTIES
>> ('serialization.null.format' = '');
>>
>> select * from Ext_STG1 limit 100;
>>
>>
>> Above Query shows Empty Results.
>>
>>
>> When I delete Zero bytes files, then i could see data from select
>> external table. Is this expected behaviour. Is there any setting for
>> ignoring Zero bytes files in hive external table
>>
>>
>> -Mahens
>>
>>
>> --
> Thai
>
>
>
>


Re: May 2018 Hive User Group Meeting

2018-05-02 Thread Ajay Chander
+1 for streaming or a recording. Thanks

On Wed, May 2, 2018 at 10:54 AM Elliot West  wrote:

> +1 for streaming or a recording. Content looks excellent.
>
> On 2 May 2018 at 15:51, dan young  wrote:
>
>> Looks like great talks, will this be streamed anywhere?
>>
>> On Wed, May 2, 2018, 8:48 AM Sahil Takiar  wrote:
>>
>>> Hey Everyone,
>>>
>>> The agenda for the meetup has been set and I'm excited to say we have
>>> lots of interesting talks scheduled! Below is final agenda, the full list
>>> of abstracts will be sent out soon. If you are planning to attend, please
>>> RSVP on the meetup link so we can get an accurate headcount of attendees (
>>> https://www.meetup.com/Hive-User-Group-Meeting/events/249641278/).
>>>
>>> 6:30 - 7:00 PM Networking and Refreshments
>>> 7:00PM - 8:20 PM Lightning Talks (10 min each) - 8 talks total
>>>
>>>- What's new in Hive 3.0.0 - Ashutosh Chauhan
>>>- Hive-on-Spark at Uber: Efficiency & Scale - Xuefu Zhang
>>>- Hive-on-S3 Performance: Past, Present, and Future - Sahil Takiar
>>>- Dali: Data Access Layer at LinkedIn - Adwait Tumbde
>>>- Parquet Vectorization in Hive - Vihang Karajgaonkar
>>>- ORC Column Level Encryption - Owen O’Malley
>>>- Running Hive at Scale @ Lyft - Sharanya Santhanam, Rohit Menon
>>>- Materialized Views in Hive - Jesus Camacho Rodriguez
>>>
>>> 8:30 PM - 9:00 PM Hive Metastore Panel
>>>
>>>- Moderator: Vihang Karajgaonkar
>>>- Participants:
>>>   - Daniel Dai - Hive Metastore Caching
>>>   - Alan Gates - Hive Metastore Separation
>>>   - Rituparna Agrawal - Customer Use Cases & Pain Points of (Big)
>>>   Metadata
>>>
>>> The Metastore panel will consist of a short presentation by each
>>> panelist followed by a Q&A session driven by the moderator.
>>>
>>> On Tue, Apr 24, 2018 at 2:53 PM, Sahil Takiar 
>>> wrote:
>>>
 We still have a few slots open for lightening talks, so if anyone is
 interested in giving a presentation don't hesitate to reach out!

 If you are planning to attend the meetup, please RSVP on the Meetup
 link (https://www.meetup.com/Hive-User-Group-Meeting/events/249641278/)
 so that we can get an accurate headcount for food.

 Thanks!

 --Sahil

 On Wed, Apr 11, 2018 at 5:08 PM, Sahil Takiar 
 wrote:

> Hi all,
>
> I'm happy to announce that the Hive community is organizing a Hive
> user group meeting in the Bay Area next month. The details can be found at
> https://www.meetup.com/Hive-User-Group-Meeting/events/249641278/
>
> The format of this meetup will be slightly different from previous
> ones. There will be one hour dedicated to lightning talks, followed by a
> group discussion on the future of the Hive Metastore.
>
> We are inviting talk proposals from Hive users as well as developers
> at this time. Please contact either myself (takiar.sa...@gmail.com),
> Vihang Karajgaonkar (vih...@cloudera.com), or Peter Vary (
> pv...@cloudera.com) with proposals. We currently have 5 openings.
>
> Please let me know if you have any questions or suggestions.
>
> Thanks,
> Sahil
>



 --
 Sahil Takiar
 Software Engineer
 takiar.sa...@gmail.com | (510) 673-0309

>>>
>>>
>>>
>>> --
>>> Sahil Takiar
>>> Software Engineer
>>> takiar.sa...@gmail.com | (510) 673-0309
>>>
>>
>


Re: May 2018 Hive User Group Meeting

2018-05-02 Thread Elliot West
+1 for streaming or a recording. Content looks excellent.

On 2 May 2018 at 15:51, dan young  wrote:

> Looks like great talks, will this be streamed anywhere?
>
> On Wed, May 2, 2018, 8:48 AM Sahil Takiar  wrote:
>
>> Hey Everyone,
>>
>> The agenda for the meetup has been set and I'm excited to say we have
>> lots of interesting talks scheduled! Below is final agenda, the full list
>> of abstracts will be sent out soon. If you are planning to attend, please
>> RSVP on the meetup link so we can get an accurate headcount of attendees (
>> https://www.meetup.com/Hive-User-Group-Meeting/events/249641278/).
>>
>> 6:30 - 7:00 PM Networking and Refreshments
>> 7:00PM - 8:20 PM Lightning Talks (10 min each) - 8 talks total
>>
>>- What's new in Hive 3.0.0 - Ashutosh Chauhan
>>- Hive-on-Spark at Uber: Efficiency & Scale - Xuefu Zhang
>>- Hive-on-S3 Performance: Past, Present, and Future - Sahil Takiar
>>- Dali: Data Access Layer at LinkedIn - Adwait Tumbde
>>- Parquet Vectorization in Hive - Vihang Karajgaonkar
>>- ORC Column Level Encryption - Owen O’Malley
>>- Running Hive at Scale @ Lyft - Sharanya Santhanam, Rohit Menon
>>- Materialized Views in Hive - Jesus Camacho Rodriguez
>>
>> 8:30 PM - 9:00 PM Hive Metastore Panel
>>
>>- Moderator: Vihang Karajgaonkar
>>- Participants:
>>   - Daniel Dai - Hive Metastore Caching
>>   - Alan Gates - Hive Metastore Separation
>>   - Rituparna Agrawal - Customer Use Cases & Pain Points of (Big)
>>   Metadata
>>
>> The Metastore panel will consist of a short presentation by each panelist
>> followed by a Q&A session driven by the moderator.
>>
>> On Tue, Apr 24, 2018 at 2:53 PM, Sahil Takiar 
>> wrote:
>>
>>> We still have a few slots open for lightening talks, so if anyone is
>>> interested in giving a presentation don't hesitate to reach out!
>>>
>>> If you are planning to attend the meetup, please RSVP on the Meetup link
>>> (https://www.meetup.com/Hive-User-Group-Meeting/events/249641278/) so
>>> that we can get an accurate headcount for food.
>>>
>>> Thanks!
>>>
>>> --Sahil
>>>
>>> On Wed, Apr 11, 2018 at 5:08 PM, Sahil Takiar 
>>> wrote:
>>>
 Hi all,

 I'm happy to announce that the Hive community is organizing a Hive user
 group meeting in the Bay Area next month. The details can be found at
 https://www.meetup.com/Hive-User-Group-Meeting/events/249641278/

 The format of this meetup will be slightly different from previous
 ones. There will be one hour dedicated to lightning talks, followed by a
 group discussion on the future of the Hive Metastore.

 We are inviting talk proposals from Hive users as well as developers at
 this time. Please contact either myself (takiar.sa...@gmail.com),
 Vihang Karajgaonkar (vih...@cloudera.com), or Peter Vary (
 pv...@cloudera.com) with proposals. We currently have 5 openings.

 Please let me know if you have any questions or suggestions.

 Thanks,
 Sahil

>>>
>>>
>>>
>>> --
>>> Sahil Takiar
>>> Software Engineer
>>> takiar.sa...@gmail.com | (510) 673-0309
>>>
>>
>>
>>
>> --
>> Sahil Takiar
>> Software Engineer
>> takiar.sa...@gmail.com | (510) 673-0309
>>
>


Re: May 2018 Hive User Group Meeting

2018-05-02 Thread dan young
Looks like great talks, will this be streamed anywhere?

On Wed, May 2, 2018, 8:48 AM Sahil Takiar  wrote:

> Hey Everyone,
>
> The agenda for the meetup has been set and I'm excited to say we have lots
> of interesting talks scheduled! Below is final agenda, the full list of
> abstracts will be sent out soon. If you are planning to attend, please RSVP
> on the meetup link so we can get an accurate headcount of attendees (
> https://www.meetup.com/Hive-User-Group-Meeting/events/249641278/).
>
> 6:30 - 7:00 PM Networking and Refreshments
> 7:00PM - 8:20 PM Lightning Talks (10 min each) - 8 talks total
>
>- What's new in Hive 3.0.0 - Ashutosh Chauhan
>- Hive-on-Spark at Uber: Efficiency & Scale - Xuefu Zhang
>- Hive-on-S3 Performance: Past, Present, and Future - Sahil Takiar
>- Dali: Data Access Layer at LinkedIn - Adwait Tumbde
>- Parquet Vectorization in Hive - Vihang Karajgaonkar
>- ORC Column Level Encryption - Owen O’Malley
>- Running Hive at Scale @ Lyft - Sharanya Santhanam, Rohit Menon
>- Materialized Views in Hive - Jesus Camacho Rodriguez
>
> 8:30 PM - 9:00 PM Hive Metastore Panel
>
>- Moderator: Vihang Karajgaonkar
>- Participants:
>   - Daniel Dai - Hive Metastore Caching
>   - Alan Gates - Hive Metastore Separation
>   - Rituparna Agrawal - Customer Use Cases & Pain Points of (Big)
>   Metadata
>
> The Metastore panel will consist of a short presentation by each panelist
> followed by a Q&A session driven by the moderator.
>
> On Tue, Apr 24, 2018 at 2:53 PM, Sahil Takiar 
> wrote:
>
>> We still have a few slots open for lightening talks, so if anyone is
>> interested in giving a presentation don't hesitate to reach out!
>>
>> If you are planning to attend the meetup, please RSVP on the Meetup link (
>> https://www.meetup.com/Hive-User-Group-Meeting/events/249641278/) so
>> that we can get an accurate headcount for food.
>>
>> Thanks!
>>
>> --Sahil
>>
>> On Wed, Apr 11, 2018 at 5:08 PM, Sahil Takiar 
>> wrote:
>>
>>> Hi all,
>>>
>>> I'm happy to announce that the Hive community is organizing a Hive user
>>> group meeting in the Bay Area next month. The details can be found at
>>> https://www.meetup.com/Hive-User-Group-Meeting/events/249641278/
>>>
>>> The format of this meetup will be slightly different from previous ones.
>>> There will be one hour dedicated to lightning talks, followed by a group
>>> discussion on the future of the Hive Metastore.
>>>
>>> We are inviting talk proposals from Hive users as well as developers at
>>> this time. Please contact either myself (takiar.sa...@gmail.com),
>>> Vihang Karajgaonkar (vih...@cloudera.com), or Peter Vary (
>>> pv...@cloudera.com) with proposals. We currently have 5 openings.
>>>
>>> Please let me know if you have any questions or suggestions.
>>>
>>> Thanks,
>>> Sahil
>>>
>>
>>
>>
>> --
>> Sahil Takiar
>> Software Engineer
>> takiar.sa...@gmail.com | (510) 673-0309
>>
>
>
>
> --
> Sahil Takiar
> Software Engineer
> takiar.sa...@gmail.com | (510) 673-0309
>


Re: May 2018 Hive User Group Meeting

2018-05-02 Thread Sahil Takiar
Hey Everyone,

The agenda for the meetup has been set and I'm excited to say we have lots
of interesting talks scheduled! Below is final agenda, the full list of
abstracts will be sent out soon. If you are planning to attend, please RSVP
on the meetup link so we can get an accurate headcount of attendees (
https://www.meetup.com/Hive-User-Group-Meeting/events/249641278/).

6:30 - 7:00 PM Networking and Refreshments
7:00PM - 8:20 PM Lightning Talks (10 min each) - 8 talks total

   - What's new in Hive 3.0.0 - Ashutosh Chauhan
   - Hive-on-Spark at Uber: Efficiency & Scale - Xuefu Zhang
   - Hive-on-S3 Performance: Past, Present, and Future - Sahil Takiar
   - Dali: Data Access Layer at LinkedIn - Adwait Tumbde
   - Parquet Vectorization in Hive - Vihang Karajgaonkar
   - ORC Column Level Encryption - Owen O’Malley
   - Running Hive at Scale @ Lyft - Sharanya Santhanam, Rohit Menon
   - Materialized Views in Hive - Jesus Camacho Rodriguez

8:30 PM - 9:00 PM Hive Metastore Panel

   - Moderator: Vihang Karajgaonkar
   - Participants:
  - Daniel Dai - Hive Metastore Caching
  - Alan Gates - Hive Metastore Separation
  - Rituparna Agrawal - Customer Use Cases & Pain Points of (Big)
  Metadata

The Metastore panel will consist of a short presentation by each panelist
followed by a Q&A session driven by the moderator.

On Tue, Apr 24, 2018 at 2:53 PM, Sahil Takiar 
wrote:

> We still have a few slots open for lightening talks, so if anyone is
> interested in giving a presentation don't hesitate to reach out!
>
> If you are planning to attend the meetup, please RSVP on the Meetup link (
> https://www.meetup.com/Hive-User-Group-Meeting/events/249641278/) so that
> we can get an accurate headcount for food.
>
> Thanks!
>
> --Sahil
>
> On Wed, Apr 11, 2018 at 5:08 PM, Sahil Takiar 
> wrote:
>
>> Hi all,
>>
>> I'm happy to announce that the Hive community is organizing a Hive user
>> group meeting in the Bay Area next month. The details can be found at
>> https://www.meetup.com/Hive-User-Group-Meeting/events/249641278/
>>
>> The format of this meetup will be slightly different from previous ones.
>> There will be one hour dedicated to lightning talks, followed by a group
>> discussion on the future of the Hive Metastore.
>>
>> We are inviting talk proposals from Hive users as well as developers at
>> this time. Please contact either myself (takiar.sa...@gmail.com), Vihang
>> Karajgaonkar (vih...@cloudera.com), or Peter Vary (pv...@cloudera.com)
>> with proposals. We currently have 5 openings.
>>
>> Please let me know if you have any questions or suggestions.
>>
>> Thanks,
>> Sahil
>>
>
>
>
> --
> Sahil Takiar
> Software Engineer
> takiar.sa...@gmail.com | (510) 673-0309
>



-- 
Sahil Takiar
Software Engineer
takiar.sa...@gmail.com | (510) 673-0309


Re: Hive External Table with Zero Bytes files

2018-05-02 Thread Mahender Sarangam
ping..

On 5/1/2018 3:57 AM, Mahender Sarangam wrote:

Thanks Thai. I have mentioned wrongly Folder Name, it 's same DAY=20180325 
(Folder) and same has Filename. actually in our upstream, our source table is 
partitioned by Date. Whenever a table is partitioned, we see Zero Byte. Now 
when we create external table with partitioned by columns and fire select query 
no data is returned. . If I delete manually those files (Zero Bytes), we were 
able to read.


/Mahender

On 4/28/2018 6:36 AM, Thai Bui wrote:
Your external table is referencing the .../day=201803250 location which is 
empty. Point your table to the capital .../DAY=201803250 and you should be able 
to read the data there.

Also, it looks like you want external partitioned table. You’ll need to create 
an external table with a partition clause, then alter the table and add 
partition for each of the ../DAY=someday path that you have.

On Sat, Apr 28, 2018 at 4:05 AM Mahender Sarangam 
mailto:mahender.bigd...@outlook.com>> wrote:

Gentle Ping. Please help me on below issue. Has any one faced same issue

On 4/27/2018 1:28 AM, Mahender Sarangam wrote:

Hi,

Can any one faced issue while fetching data from external table. We are copying 
data from upstream system into our storage S3. As part of copy, directories 
along with Zero bytes files are been copied. Source File Format is in JSON 
format.  Below is Folder Hierarchy Structure


 DATE  -->  

---> Folder

 1.json.gz  --> File

  2.json.gz

 ---> Empty Zero Bytes Files.

Please find below screenshot

[cid:part2.03F0F4D8.6DB0963A@outlook.com]

We are trying to create external table with JSON Serde.

ADD JAR 
wasb://jsonse...@xyz.blob.core.windows.net/json/json-serde-1.3.9.jar;
 SET hive.mapred.supports.subdirectories=TRUE;
 SET mapred.input.dir.recursive=TRUE;
SET hive.merge.mapfiles = true;
SET hive.merge.mapredfiles = true;
SET hive.merge.tezfiles = true;


 DROP TABLE IF EXISTS Ext_STG1;
 CREATE EXTERNAL TABLE Ext_STG1(Col1 String, Col2 String, Col3 String) ROW 
FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' WITH SERDEPROPERTIES 
("case.insensitive" = "true", "ignore.malformed.json" = "true")
STORED AS TEXTFILE LOCATION 
'wasb://contain...@xyz.blob.core.windows.net/date/day=201803250/'
 TBLPROPERTIES ('serialization.null.format' = '');

select * from Ext_STG1 limit 100;


Above Query shows Empty Results.


When I delete Zero bytes files, then i could see data from select external 
table. Is this expected behaviour. Is there any setting for ignoring Zero bytes 
files in hive external table


-Mahens

--
Thai