Re: [VOTE] Should we release storage-api 2.6.1 rc0?

2018-05-09 Thread Deepak Jaiswal
With 3 +1s and no -1s the vote passes. Thanks Owen and Jesus.

Regards,
Deepak

On 5/9/18, 9:49 PM, "Owen O'Malley"  wrote:

+1
checked signatures
compiled and ran tests

On Wed, May 9, 2018 at 8:58 PM, Jesus Camacho Rodriguez <
jcamachorodrig...@hortonworks.com> wrote:

> +1
> - compiled from src
> - ran unit tests
>
> -Jesús
>
>
> On 5/9/18, 11:48 AM, "Deepak Jaiswal"  wrote:
>
> Ping!
>
> On 5/8/18, 12:59 PM, "Deepak Jaiswal" 
> wrote:
>
> All,
>
> I would like to make a new release of the storage-api. It contains
> key changes related to Murmur3 hash.
>
> Artifacts:
>
> Tag: https://github.com/apache/hive/releases/tag/storage-
> release-2.6.1-rc0
> Tar ball : http://home.apache.org/~djaiswal/hive-storage-2.6.1/
>
> Thanks,
> Deepak
>
>
>
>
>




Re: [VOTE] Should we release storage-api 2.6.1 rc0?

2018-05-09 Thread Owen O'Malley
+1
checked signatures
compiled and ran tests

On Wed, May 9, 2018 at 8:58 PM, Jesus Camacho Rodriguez <
jcamachorodrig...@hortonworks.com> wrote:

> +1
> - compiled from src
> - ran unit tests
>
> -Jesús
>
>
> On 5/9/18, 11:48 AM, "Deepak Jaiswal"  wrote:
>
> Ping!
>
> On 5/8/18, 12:59 PM, "Deepak Jaiswal" 
> wrote:
>
> All,
>
> I would like to make a new release of the storage-api. It contains
> key changes related to Murmur3 hash.
>
> Artifacts:
>
> Tag: https://github.com/apache/hive/releases/tag/storage-
> release-2.6.1-rc0
> Tar ball : http://home.apache.org/~djaiswal/hive-storage-2.6.1/
>
> Thanks,
> Deepak
>
>
>
>
>


[jira] [Created] (HIVE-19484) 'IN' & '=' do not behave the same way for Date/Timestamp comparison.

2018-05-09 Thread Venu Yanamandra (JIRA)
Venu Yanamandra created HIVE-19484:
--

 Summary: 'IN' & '=' do not behave the same way for Date/Timestamp 
comparison.
 Key: HIVE-19484
 URL: https://issues.apache.org/jira/browse/HIVE-19484
 Project: Hive
  Issue Type: Bug
Reporter: Venu Yanamandra


We find that there is a difference in the way '=' operator and 'IN' behave when 
operating on timestamps.

The issue could be demonstrated using below -
   i) create table test_table (test_date timestamp);
  ii) insert into test_table values('2018-01-01');
 iii) select * from test_table where test_date='2018-01-01'; -- Works
 iv) select * from test_table where test_date in ('2018-01-01'); -- Fails with 
error [1]
  v) However, casting works - 
   select * from test_table where test_date in (cast ('2018-01-01' as 
timestamp));

As per url [2], we find no references to limitations when '=' or 'IN' are used.

As per the url [3], we find that there are implicit type conversions defined. 
However, '=' operates in a different way than the 'IN' operator.

We would like to see if 'IN' could be made to behave the same way as '='.


[1]:
 Caused by: org.apache.hadoop.hive.ql.exec.UDFArgumentException: The arguments 
for IN should be the same type! Types are: {timestamp IN (string)}

[2]:
 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-LogicalOperators

[3]:
 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-AllowedImplicitConversions




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [VOTE] Should we release storage-api 2.6.1 rc0?

2018-05-09 Thread Jesus Camacho Rodriguez
+1
- compiled from src
- ran unit tests

-Jesús


On 5/9/18, 11:48 AM, "Deepak Jaiswal"  wrote:

Ping!

On 5/8/18, 12:59 PM, "Deepak Jaiswal"  wrote:

All,

I would like to make a new release of the storage-api. It contains key 
changes related to Murmur3 hash.

Artifacts:

Tag: 
https://github.com/apache/hive/releases/tag/storage-release-2.6.1-rc0
Tar ball : http://home.apache.org/~djaiswal/hive-storage-2.6.1/

Thanks,
Deepak






[jira] [Created] (HIVE-19483) Metastore cleaner tasks that run periodically are created more than once

2018-05-09 Thread Jesus Camacho Rodriguez (JIRA)
Jesus Camacho Rodriguez created HIVE-19483:
--

 Summary: Metastore cleaner tasks that run periodically are created 
more than once
 Key: HIVE-19483
 URL: https://issues.apache.org/jira/browse/HIVE-19483
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 3.0.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19482) Metastore cleaner tasks that run periodically are created more than once

2018-05-09 Thread Jesus Camacho Rodriguez (JIRA)
Jesus Camacho Rodriguez created HIVE-19482:
--

 Summary: Metastore cleaner tasks that run periodically are created 
more than once
 Key: HIVE-19482
 URL: https://issues.apache.org/jira/browse/HIVE-19482
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 3.0.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
 Attachments: HIVE-19482.patch

This can lead to a large number of cleaner objects depending on the number of 
metastore clients.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: My Hive QA job failed with "No space left on device (28)"

2018-05-09 Thread Vihang Karajgaonkar
I am seeing a lot of statements in hive.log which look like these

2018-05-07T05:28:15,113 DEBUG [main] ldif.LdifReader: Ldif version :
12018-05-07T05:28:15,114
DEBUG [main] ldif.LdifReader: Read an entry : dn:
m-oid=2.5.13.19,ou=normalizers,cn=system,ou=schemachangetype:
addcreatetimestamp:
20090818022727Zentrycsn: 20090818052727.84Z#00#000#00m-oid:
2.5.13.19m-fqcn:
org.apache.directory.shared.ldap.schema.normalizers.NoOpNormalizercreatorsname:
uid=admin,ou=systemobjectclass: metaNormalizerobjectclass: metaTopobjectclass:
top
Does anyone know what these are and why they are spewing at such a high
frequency?

On Wed, May 9, 2018 at 4:05 PM, Vihang Karajgaonkar 
wrote:

> Adding the correct dev list.
>
> It looks like we are generating a lot more logs than usual. Each precommit
> is generating ~22G of logs and the disk eventually ran out. I had to delete
> logs for PreCommit-HIVE-Build-1075[0-9] and
> PreCommit-HIVE-Build-1074[5-9] jobs to release the disk space. This usually
> happens if some patch adds a super-verbose debug log.
>
>
> On Wed, May 9, 2018 at 3:41 PM, Vihang Karajgaonkar 
> wrote:
>
>> Hi Matt,
>>
>> Thanks for reporting the issue. Where do you see this error message? I
>> tried to find it but couldn't search for this in the console output.
>>
>> Thanks,
>> Vihang
>>
>> On Wed, May 9, 2018 at 2:05 PM, Matthew McCline > > wrote:
>>
>>> https://builds.apache.org/job/PreCommit-HIVE-Build/10785​
>>>
>>>
>>> Thanks
>>>
>>
>>
>


Re: My Hive QA job failed with "No space left on device (28)"

2018-05-09 Thread Vihang Karajgaonkar
Adding the correct dev list.

It looks like we are generating a lot more logs than usual. Each precommit
is generating ~22G of logs and the disk eventually ran out. I had to delete
logs for PreCommit-HIVE-Build-1075[0-9] and PreCommit-HIVE-Build-1074[5-9]
jobs to release the disk space. This usually happens if some patch adds a
super-verbose debug log.


On Wed, May 9, 2018 at 3:41 PM, Vihang Karajgaonkar 
wrote:

> Hi Matt,
>
> Thanks for reporting the issue. Where do you see this error message? I
> tried to find it but couldn't search for this in the console output.
>
> Thanks,
> Vihang
>
> On Wed, May 9, 2018 at 2:05 PM, Matthew McCline 
> wrote:
>
>> https://builds.apache.org/job/PreCommit-HIVE-Build/10785​
>>
>>
>> Thanks
>>
>
>


Review Request 67042: HIVE-19449: Create minimized uber jar for hive streaming module

2018-05-09 Thread j . prasanth . j

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67042/
---

Review request for hive, Eugene Koifman and Sergey Shelukhin.


Bugs: HIVE-19449
https://issues.apache.org/jira/browse/HIVE-19449


Repository: hive-git


Description
---

HIVE-19449: Create minimized uber jar for hive streaming module


Diffs
-

  itests/hive-unit/pom.xml 26e423c5ff2d39d729c122a459c33e42bd4e389c 
  streaming/pom.xml ccc55ebb0d36acba7719ef7bcf6d0c4954097187 
  streaming/src/java/org/apache/hive/streaming/StrictRegexWriter.java 
3651fa120ae5c753d2a833bc28c9dcfcd24be823 


Diff: https://reviews.apache.org/r/67042/diff/1/


Testing
---


Thanks,

Prasanth_J



[jira] [Created] (HIVE-19481) sample10.q returns possibly wrong results for insert-only transactional table

2018-05-09 Thread Steve Yeom (JIRA)
Steve Yeom created HIVE-19481:
-

 Summary: sample10.q returns possibly wrong results for insert-only 
transactional table
 Key: HIVE-19481
 URL: https://issues.apache.org/jira/browse/HIVE-19481
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 3.0.0
Reporter: Steve Yeom
 Fix For: 3.1.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 67039: HIVE-19479 encoded stream seek is incorrect for 0-length RGs in LLAP IO

2018-05-09 Thread j . prasanth . j

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67039/#review202800
---


Ship it!




Ship It!

- Prasanth_J


On May 9, 2018, 8:55 p.m., Sergey Shelukhin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67039/
> ---
> 
> (Updated May 9, 2018, 8:55 p.m.)
> 
> 
> Review request for hive and Prasanth_J.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> see jira
> 
> 
> Diffs
> -
> 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/io/decode/OrcEncodedDataConsumer.java
>  fc0c66a888 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedReaderImpl.java 
> 1d7eceb1ef 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedTreeReaderFactory.java
>  42532f9a0e 
> 
> 
> Diff: https://reviews.apache.org/r/67039/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Sergey Shelukhin
> 
>



Re: Review Request 67039: HIVE-19479 encoded stream seek is incorrect for 0-length RGs in LLAP IO

2018-05-09 Thread j . prasanth . j


> On May 9, 2018, 8:43 p.m., Prasanth_J wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedTreeReaderFactory.java
> > Lines 1076 (patched)
> > 
> >
> > Positions are packed variable in protobuf. 
> > Also there is stream suppression. 
> > Same column in a stripe may have a stream suppressed (say all non-nulls 
> > values, isPresent gets suppressed) then its positions will not be recorded. 
> > Same column in another stripe might have all streams. So all this does is 
> > if the stream does not exist don't move on to next position.
> > 
> > [[0,0,0],[0,0],[0,0]] -> positions for isPresent, Data, Length..
> > [[0,0],[0,0]] -> positions for Data, Length.. (isPresent suppressed)
> 
> Sergey Shelukhin wrote:
> I'm not sure what you mean... I'm moving this code outside of the 
> 0-length check, and moving into utility method.
> All calls of the same method did the exact same thing before.
> 
> Prasanth_J wrote:
> 
> https://github.com/apache/orc/blob/master/java/core/src/java/org/apache/orc/impl/writer/TreeWriterBase.java#L248-L256
>  
> If a stream is suppressed, its positions are removed.
> 
> So I am not sure if it is safe to advance the positions without checking 
> if the stream exists/available or not.

Actually we remove the positions only for isPresent streams. I think it should 
be ok if we advance positions first before seek for other streams.


- Prasanth_J


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67039/#review202793
---


On May 9, 2018, 8:55 p.m., Sergey Shelukhin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67039/
> ---
> 
> (Updated May 9, 2018, 8:55 p.m.)
> 
> 
> Review request for hive and Prasanth_J.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> see jira
> 
> 
> Diffs
> -
> 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/io/decode/OrcEncodedDataConsumer.java
>  fc0c66a888 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedReaderImpl.java 
> 1d7eceb1ef 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedTreeReaderFactory.java
>  42532f9a0e 
> 
> 
> Diff: https://reviews.apache.org/r/67039/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Sergey Shelukhin
> 
>



Re: Review Request 67039: HIVE-19479 encoded stream seek is incorrect for 0-length RGs in LLAP IO

2018-05-09 Thread j . prasanth . j


> On May 9, 2018, 8:43 p.m., Prasanth_J wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedTreeReaderFactory.java
> > Lines 1076 (patched)
> > 
> >
> > Positions are packed variable in protobuf. 
> > Also there is stream suppression. 
> > Same column in a stripe may have a stream suppressed (say all non-nulls 
> > values, isPresent gets suppressed) then its positions will not be recorded. 
> > Same column in another stripe might have all streams. So all this does is 
> > if the stream does not exist don't move on to next position.
> > 
> > [[0,0,0],[0,0],[0,0]] -> positions for isPresent, Data, Length..
> > [[0,0],[0,0]] -> positions for Data, Length.. (isPresent suppressed)
> 
> Sergey Shelukhin wrote:
> I'm not sure what you mean... I'm moving this code outside of the 
> 0-length check, and moving into utility method.
> All calls of the same method did the exact same thing before.

https://github.com/apache/orc/blob/master/java/core/src/java/org/apache/orc/impl/writer/TreeWriterBase.java#L248-L256
 
If a stream is suppressed, its positions are removed.

So I am not sure if it is safe to advance the positions without checking if the 
stream exists/available or not.


- Prasanth_J


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67039/#review202793
---


On May 9, 2018, 8:55 p.m., Sergey Shelukhin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67039/
> ---
> 
> (Updated May 9, 2018, 8:55 p.m.)
> 
> 
> Review request for hive and Prasanth_J.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> see jira
> 
> 
> Diffs
> -
> 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/io/decode/OrcEncodedDataConsumer.java
>  fc0c66a888 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedReaderImpl.java 
> 1d7eceb1ef 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedTreeReaderFactory.java
>  42532f9a0e 
> 
> 
> Diff: https://reviews.apache.org/r/67039/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Sergey Shelukhin
> 
>



My Hive QA job failed with "No space left on device (28)"

2018-05-09 Thread Matthew McCline
https://builds.apache.org/job/PreCommit-HIVE-Build/10785?


Thanks


Re: Review Request 67039: HIVE-19479 encoded stream seek is incorrect for 0-length RGs in LLAP IO

2018-05-09 Thread Sergey Shelukhin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67039/
---

(Updated May 9, 2018, 8:55 p.m.)


Review request for hive and Prasanth_J.


Repository: hive-git


Description
---

see jira


Diffs (updated)
-

  
llap-server/src/java/org/apache/hadoop/hive/llap/io/decode/OrcEncodedDataConsumer.java
 fc0c66a888 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedReaderImpl.java 
1d7eceb1ef 
  
ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedTreeReaderFactory.java
 42532f9a0e 


Diff: https://reviews.apache.org/r/67039/diff/2/

Changes: https://reviews.apache.org/r/67039/diff/1-2/


Testing
---


Thanks,

Sergey Shelukhin



[jira] [Created] (HIVE-19480) Implement and Incorporate MAPREDUCE-207

2018-05-09 Thread BELUGA BEHR (JIRA)
BELUGA BEHR created HIVE-19480:
--

 Summary: Implement and Incorporate MAPREDUCE-207
 Key: HIVE-19480
 URL: https://issues.apache.org/jira/browse/HIVE-19480
 Project: Hive
  Issue Type: New Feature
  Components: HiveServer2
Affects Versions: 3.0.0
Reporter: BELUGA BEHR


* HiveServer2 has the ability to run many MapReduce jobs in parallel.
 * Each MapReduce application calculates the job's file splits at the client 
level
 * = HiveServer2 loading many file splits at the same time, putting pressure on 
memory

{quote}"The client running the job calculates the splits for the job by calling 
getSplits(), then sends them to the application master, which uses their 
storage locations to schedule map tasks that will process them on the cluster."
 - "Hadoop: The Definitive Guide"{quote}
MAPREDUCE-207 should address this memory pressure by moving split calculations 
into ApplicationMaster. Spark and Tez already take this approach.

Once MAPREDUCE-207 is completed, leverage the capability in HiveServer2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 67039: HIVE-19479 encoded stream seek is incorrect for 0-length RGs in LLAP IO

2018-05-09 Thread Sergey Shelukhin


> On May 9, 2018, 8:43 p.m., Prasanth_J wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedTreeReaderFactory.java
> > Lines 280 (patched)
> > 
> >
> > the reader seeks mostly delegates to data stream seek and length stream 
> > seek (RLE seeks does more than just position seek - it has to seek within 
> > bytes or runs)

This patch only adjusts for InStream seeks


> On May 9, 2018, 8:43 p.m., Prasanth_J wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedTreeReaderFactory.java
> > Lines 1076 (patched)
> > 
> >
> > Positions are packed variable in protobuf. 
> > Also there is stream suppression. 
> > Same column in a stripe may have a stream suppressed (say all non-nulls 
> > values, isPresent gets suppressed) then its positions will not be recorded. 
> > Same column in another stripe might have all streams. So all this does is 
> > if the stream does not exist don't move on to next position.
> > 
> > [[0,0,0],[0,0],[0,0]] -> positions for isPresent, Data, Length..
> > [[0,0],[0,0]] -> positions for Data, Length.. (isPresent suppressed)

I'm not sure what you mean... I'm moving this code outside of the 0-length 
check, and moving into utility method.
All calls of the same method did the exact same thing before.


> On May 9, 2018, 8:43 p.m., Prasanth_J wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedTreeReaderFactory.java
> > Lines 1894 (patched)
> > 
> >
> > where is this skipSeek() implementation? I don't see it in the patch or 
> > master.

Looks like it is in ORC on master. Changed to a static method. I edited 
pre-ORC-split impl when originally fixing the issue.


- Sergey


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67039/#review202793
---


On May 9, 2018, 7:12 p.m., Sergey Shelukhin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67039/
> ---
> 
> (Updated May 9, 2018, 7:12 p.m.)
> 
> 
> Review request for hive and Prasanth_J.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> see jira
> 
> 
> Diffs
> -
> 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/io/decode/OrcEncodedDataConsumer.java
>  fc0c66a888 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedReaderImpl.java 
> 1d7eceb1ef 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedTreeReaderFactory.java
>  42532f9a0e 
> 
> 
> Diff: https://reviews.apache.org/r/67039/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Sergey Shelukhin
> 
>



Re: Review Request 66290: HIVE-14388 : Add number of rows inserted message after insert command in Beeline

2018-05-09 Thread Bharathkrishna Guruvayoor Murali via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66290/
---

(Updated May 9, 2018, 8:46 p.m.)


Review request for hive, Sahil Takiar and Vihang Karajgaonkar.


Changes
---

Reverted the checkstyle/formatting only related changes.
Responded to the review comments.

Now no formatting related changes are present in the patch.


Bugs: HIVE-14388
https://issues.apache.org/jira/browse/HIVE-14388


Repository: hive-git


Description
---

Currently, when you run insert command on beeline, it returns a message saying 
"No rows affected .."
A better and more intuitive msg would be "xxx rows inserted (26.068 seconds)"

Added the numRows parameter as part of QueryState.
Adding the numRows to the response as well to display in beeline.

Getting the count in FileSinkOperator and setting it in statsMap, when it 
operates only on table specific rows for the particular operation. (so that we 
can get only the insert to table count and avoid counting non-table specific 
file-sink operations happening during query execution).


Diffs (updated)
-

  beeline/src/main/resources/BeeLine.properties 
c41b3ed637e04d8d2d9800ad5e9284264f7e4055 
  itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestJdbcDriver2.java 
b217259553be472863cd33bb2259aa700e6c3528 
  jdbc/src/java/org/apache/hive/jdbc/HiveStatement.java 
06542cee02e5dc4696f2621bb45cc4f24c67dfda 
  ql/src/java/org/apache/hadoop/hive/ql/Driver.java 
52799b30c39af2f192c4ae22ce7d68b403014183 
  ql/src/java/org/apache/hadoop/hive/ql/MapRedStats.java 
cf9c2273159c0d779ea90ad029613678fb0967a6 
  ql/src/java/org/apache/hadoop/hive/ql/QueryState.java 
706c9ffa48b9c3b4a6fdaae78bab1d39c3d0efda 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java 
01a5b4c9c328cb034a613a1539cea2584e122fb4 
  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/HadoopJobExecHelper.java 
fcdc9967f12a454a9d3f31031e2261f264479118 
  ql/src/test/results/clientpositive/llap/dp_counter_mm.q.out 
18f4c69a191bde3cae2d5efac5ef20fd0b1a9f0c 
  ql/src/test/results/clientpositive/llap/dp_counter_non_mm.q.out 
28f376f8c4c2151383286e754447d1349050ef4e 
  ql/src/test/results/clientpositive/llap/orc_ppd_basic.q.out 
96819f4e1c446f6de423f99c7697d548ff5dbe06 
  ql/src/test/results/clientpositive/llap/tez_input_counters.q.out 
d2fcdaa1bfba03e1f0e4191c8d056b05f334443d 
  service-rpc/if/TCLIService.thrift 30f8af7f3e6e0598b410498782900ac27971aef0 
  service-rpc/src/gen/thrift/gen-cpp/TCLIService_types.h 
4321ad6d3c966d30f7a69552f91804cf2f1ba6c4 
  service-rpc/src/gen/thrift/gen-cpp/TCLIService_types.cpp 
b2b62c71492b844f4439367364c5c81aa62f3908 
  
service-rpc/src/gen/thrift/gen-javabean/org/apache/hive/service/rpc/thrift/TGetOperationStatusResp.java
 15e8220eb3eb12b72c7b64029410dced33bc0d72 
  service-rpc/src/gen/thrift/gen-php/Types.php 
abb7c1ff3a2c8b72dc97689758266b675880e32b 
  service-rpc/src/gen/thrift/gen-py/TCLIService/ttypes.py 
0f8fd0745be0f4ed9e96b7bbe0f092d03649bcdf 
  service-rpc/src/gen/thrift/gen-rb/t_c_l_i_service_types.rb 
60183dae9e9927bd09a9676e49eeb4aea2401737 
  service/src/java/org/apache/hive/service/cli/CLIService.java 
c9914ba9bf8653cbcbca7d6612e98a64058c0fcc 
  service/src/java/org/apache/hive/service/cli/OperationStatus.java 
52cc3ae4f26b990b3e4edb52d9de85b3cc25f269 
  service/src/java/org/apache/hive/service/cli/operation/Operation.java 
3706c72abc77ac8bd77947cc1c5d084ddf965e9f 
  service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java 
c64c99120ad21ee98af81ec6659a2722e3e1d1c7 


Diff: https://reviews.apache.org/r/66290/diff/7/

Changes: https://reviews.apache.org/r/66290/diff/6-7/


Testing
---


Thanks,

Bharathkrishna Guruvayoor Murali



Re: Review Request 67039: HIVE-19479 encoded stream seek is incorrect for 0-length RGs in LLAP IO

2018-05-09 Thread j . prasanth . j

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67039/#review202793
---




ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedTreeReaderFactory.java
Lines 280 (patched)


the reader seeks mostly delegates to data stream seek and length stream 
seek (RLE seeks does more than just position seek - it has to seek within bytes 
or runs)



ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedTreeReaderFactory.java
Lines 1076 (patched)


Positions are packed variable in protobuf. 
Also there is stream suppression. 
Same column in a stripe may have a stream suppressed (say all non-nulls 
values, isPresent gets suppressed) then its positions will not be recorded. 
Same column in another stripe might have all streams. So all this does is if 
the stream does not exist don't move on to next position.

[[0,0,0],[0,0],[0,0]] -> positions for isPresent, Data, Length..
[[0,0],[0,0]] -> positions for Data, Length.. (isPresent suppressed)



ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedTreeReaderFactory.java
Lines 1894 (patched)


where is this skipSeek() implementation? I don't see it in the patch or 
master.


- Prasanth_J


On May 9, 2018, 7:12 p.m., Sergey Shelukhin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67039/
> ---
> 
> (Updated May 9, 2018, 7:12 p.m.)
> 
> 
> Review request for hive and Prasanth_J.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> see jira
> 
> 
> Diffs
> -
> 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/io/decode/OrcEncodedDataConsumer.java
>  fc0c66a888 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedReaderImpl.java 
> 1d7eceb1ef 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedTreeReaderFactory.java
>  42532f9a0e 
> 
> 
> Diff: https://reviews.apache.org/r/67039/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Sergey Shelukhin
> 
>



Review Request 67039: HIVE-19479 encoded stream seek is incorrect for 0-length RGs in LLAP IO

2018-05-09 Thread Sergey Shelukhin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67039/
---

Review request for hive and Prasanth_J.


Repository: hive-git


Description
---

see jira


Diffs
-

  
llap-server/src/java/org/apache/hadoop/hive/llap/io/decode/OrcEncodedDataConsumer.java
 fc0c66a888 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedReaderImpl.java 
1d7eceb1ef 
  
ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedTreeReaderFactory.java
 42532f9a0e 


Diff: https://reviews.apache.org/r/67039/diff/1/


Testing
---


Thanks,

Sergey Shelukhin



[jira] [Created] (HIVE-19479) encoded stream seek is incorrect for 0-length RGs in LLAP IO

2018-05-09 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-19479:
---

 Summary: encoded stream seek is incorrect for 0-length RGs in LLAP 
IO
 Key: HIVE-19479
 URL: https://issues.apache.org/jira/browse/HIVE-19479
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-19479.patch





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 66290: HIVE-14388 : Add number of rows inserted message after insert command in Beeline

2018-05-09 Thread Bharathkrishna Guruvayoor Murali via Review Board


> On May 8, 2018, 10:42 a.m., Peter Vary wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/mr/HadoopJobExecHelper.java
> > Lines 430-439 (patched)
> > 
> >
> > Why did you moved this inside the if statement?

Moved this inside to not have null pointer exceptions.
Noticed in some tests that sessionstate could be null. Also added the null 
check for the counter along with that.


- Bharathkrishna


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66290/#review202625
---


On May 7, 2018, 5:58 p.m., Bharathkrishna Guruvayoor Murali wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66290/
> ---
> 
> (Updated May 7, 2018, 5:58 p.m.)
> 
> 
> Review request for hive, Sahil Takiar and Vihang Karajgaonkar.
> 
> 
> Bugs: HIVE-14388
> https://issues.apache.org/jira/browse/HIVE-14388
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Currently, when you run insert command on beeline, it returns a message 
> saying "No rows affected .."
> A better and more intuitive msg would be "xxx rows inserted (26.068 seconds)"
> 
> Added the numRows parameter as part of QueryState.
> Adding the numRows to the response as well to display in beeline.
> 
> Getting the count in FileSinkOperator and setting it in statsMap, when it 
> operates only on table specific rows for the particular operation. (so that 
> we can get only the insert to table count and avoid counting non-table 
> specific file-sink operations happening during query execution).
> 
> 
> Diffs
> -
> 
>   beeline/src/main/resources/BeeLine.properties 
> c41b3ed637e04d8d2d9800ad5e9284264f7e4055 
>   itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestJdbcDriver2.java 
> b217259553be472863cd33bb2259aa700e6c3528 
>   jdbc/src/java/org/apache/hive/jdbc/HiveStatement.java 
> 06542cee02e5dc4696f2621bb45cc4f24c67dfda 
>   ql/src/java/org/apache/hadoop/hive/ql/Driver.java 
> 9f4e6f2e53b43839fefe1d2522a75a95d393544f 
>   ql/src/java/org/apache/hadoop/hive/ql/MapRedStats.java 
> cf9c2273159c0d779ea90ad029613678fb0967a6 
>   ql/src/java/org/apache/hadoop/hive/ql/QueryState.java 
> 706c9ffa48b9c3b4a6fdaae78bab1d39c3d0efda 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java 
> 01a5b4c9c328cb034a613a1539cea2584e122fb4 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/HadoopJobExecHelper.java 
> fcdc9967f12a454a9d3f31031e2261f264479118 
>   ql/src/test/results/clientpositive/llap/dp_counter_mm.q.out 
> 18f4c69a191bde3cae2d5efac5ef20fd0b1a9f0c 
>   ql/src/test/results/clientpositive/llap/dp_counter_non_mm.q.out 
> 28f376f8c4c2151383286e754447d1349050ef4e 
>   ql/src/test/results/clientpositive/llap/orc_ppd_basic.q.out 
> 96819f4e1c446f6de423f99c7697d548ff5dbe06 
>   ql/src/test/results/clientpositive/llap/tez_input_counters.q.out 
> d2fcdaa1bfba03e1f0e4191c8d056b05f334443d 
>   service-rpc/if/TCLIService.thrift 30f8af7f3e6e0598b410498782900ac27971aef0 
>   service-rpc/src/gen/thrift/gen-cpp/TCLIService_types.h 
> 4321ad6d3c966d30f7a69552f91804cf2f1ba6c4 
>   service-rpc/src/gen/thrift/gen-cpp/TCLIService_types.cpp 
> b2b62c71492b844f4439367364c5c81aa62f3908 
>   
> service-rpc/src/gen/thrift/gen-javabean/org/apache/hive/service/rpc/thrift/TGetOperationStatusResp.java
>  15e8220eb3eb12b72c7b64029410dced33bc0d72 
>   service-rpc/src/gen/thrift/gen-php/Types.php 
> abb7c1ff3a2c8b72dc97689758266b675880e32b 
>   service-rpc/src/gen/thrift/gen-py/TCLIService/ttypes.py 
> 0f8fd0745be0f4ed9e96b7bbe0f092d03649bcdf 
>   service-rpc/src/gen/thrift/gen-rb/t_c_l_i_service_types.rb 
> 60183dae9e9927bd09a9676e49eeb4aea2401737 
>   service/src/java/org/apache/hive/service/cli/CLIService.java 
> c9914ba9bf8653cbcbca7d6612e98a64058c0fcc 
>   service/src/java/org/apache/hive/service/cli/OperationStatus.java 
> 52cc3ae4f26b990b3e4edb52d9de85b3cc25f269 
>   service/src/java/org/apache/hive/service/cli/operation/Operation.java 
> 3706c72abc77ac8bd77947cc1c5d084ddf965e9f 
>   service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java 
> c64c99120ad21ee98af81ec6659a2722e3e1d1c7 
> 
> 
> Diff: https://reviews.apache.org/r/66290/diff/6/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Bharathkrishna Guruvayoor Murali
> 
>



Re: [VOTE] Should we release storage-api 2.6.1 rc0?

2018-05-09 Thread Deepak Jaiswal
Ping!

On 5/8/18, 12:59 PM, "Deepak Jaiswal"  wrote:

All,

I would like to make a new release of the storage-api. It contains key 
changes related to Murmur3 hash.

Artifacts:

Tag: https://github.com/apache/hive/releases/tag/storage-release-2.6.1-rc0
Tar ball : http://home.apache.org/~djaiswal/hive-storage-2.6.1/

Thanks,
Deepak




[jira] [Created] (HIVE-19478) Load Data extension should be able to take more SerDe information

2018-05-09 Thread Deepak Jaiswal (JIRA)
Deepak Jaiswal created HIVE-19478:
-

 Summary: Load Data extension should be able to take more SerDe 
information
 Key: HIVE-19478
 URL: https://issues.apache.org/jira/browse/HIVE-19478
 Project: Hive
  Issue Type: Task
Reporter: Deepak Jaiswal
Assignee: Deepak Jaiswal


HIVE-19453 extends Load Data statement to take inputformat and SerDe of the 
source file(s).

Might be useful to be able to pass in SerDe params which are used to initialize 
the SerDe - this could be useful for some SerDes. For example LazySimpleSerDe 
allows you to pass in the field separator, or set the timestamp format etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 66999: HIVE-19453

2018-05-09 Thread Deepak Jaiswal


> On May 8, 2018, 11:10 p.m., Jason Dere wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g
> > Line 839 (original), 840 (patched)
> > 
> >
> > Might be useful to be able to pass in SerDe params which are used to 
> > initialize the SerDe - this could be useful for some SerDes. For example 
> > LazySimpleSerDe allows you to pass in the field separator, or set the 
> > timestamp format etc.
> 
> Deepak Jaiswal wrote:
> I will take a look, thanks.

https://issues.apache.org/jira/browse/HIVE-19478 will follow this up. Thanks 
for bringing this up.


- Deepak


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66999/#review202699
---


On May 8, 2018, 6:12 a.m., Deepak Jaiswal wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66999/
> ---
> 
> (Updated May 8, 2018, 6:12 a.m.)
> 
> 
> Review request for hive, Jason Dere and Prasanth_J.
> 
> 
> Bugs: HIVE-19453
> https://issues.apache.org/jira/browse/HIVE-19453
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Extend the load data statement to take the inputformat of the source files 
> and the serde to interpret it as parameter. For eg,
>  
> load data local inpath 
> '../../data/files/load_data_job/partitions/load_data_2_partitions.txt' INTO 
> TABLE srcbucket_mapjoin
> INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
> SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe';
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g a837d67b96 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java 
> 2b88ea651b 
>   ql/src/test/queries/clientpositive/load_data_using_job.q 3928f1fa07 
>   ql/src/test/results/clientpositive/llap/load_data_using_job.q.out 
> 116630c237 
> 
> 
> Diff: https://reviews.apache.org/r/66999/diff/1/
> 
> 
> Testing
> ---
> 
> Added a test to load_data_using_job.q
> 
> 
> Thanks,
> 
> Deepak Jaiswal
> 
>



Re: Review Request 66979: HIVE-19374: Parse and process ALTER TABLE SET OWNER command syntax

2018-05-09 Thread Vihang Karajgaonkar via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66979/#review202771
---


Fix it, then Ship it!




LGTM. Thanks for the changes.


itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/metadata/TestAlterTableMetadata.java
Lines 1 (patched)


Apache header missing


- Vihang Karajgaonkar


On May 8, 2018, 11:57 p.m., Sergio Pena wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66979/
> ---
> 
> (Updated May 8, 2018, 11:57 p.m.)
> 
> 
> Review request for hive and Vihang Karajgaonkar.
> 
> 
> Bugs: HIVE-19374
> https://issues.apache.org/jira/browse/HIVE-19374
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> This patch implements the new ALTER SET ... SET OWNER command and calls the 
> HMS api calls to change the owner of the table.
> 
> The command syntax is:
> > ALTER TABLE  SET OWNER { USER |GROUP |ROLE 
> >  }
> 
> Currently, Hive sets the owner of a table to the user who created that table. 
> With this command, we will be able to change it to another user, group of 
> role (as ALTER DATABASE does).
> 
> The changes are:
> - HiveParser.g which adds the new syntax
> - HiveOperation.java which adds the new ALTERTABLE_OWNER operation
> - Table.java which gets/sets the owner type
> - SemanticAnalyzer.java which returns the DDLSemanticAnalyzer if an 
> ALTERTABLE_OWNER operation is detected
> - DDLSemanticAnalyzer.java which analyzes the ALTERTABLE_OWNER Operation
> - AlterTableDesc.java uses by the DDL semantic analyzer to change the new 
> owner information
> - MetaDataFormatUtils which displays the owner type when the DESCRIBE command 
> is called
> - JsonMetaDataFormatted which is another implementation to display the owner 
> type in Json format
> 
> 
> Diffs
> -
> 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/metadata/TestAlterTableMetadata.java
>  PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 
> 3141a7e981eb35a9fbc7f367f38f8ad420f1f928 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java 
> 879b4224494c3a9adb0713f319e586db4865fb17 
>   
> ql/src/java/org/apache/hadoop/hive/ql/metadata/formatting/JsonMetaDataFormatter.java
>  cd70eee26c06ee6476964508c54c2bb10b167530 
>   
> ql/src/java/org/apache/hadoop/hive/ql/metadata/formatting/MetaDataFormatUtils.java
>  af283e693b5a0fc68e35221b2005fcf1910bdb8e 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 
> defb8becdb5d767ae71d5c962afac43f0c068c3c 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g 
> a837d67b9615ca1ee359c7aa26f79b6f2504dd99 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzerFactory.java 
> 820046388adbc65664ae36b08aaba72943ccb6af 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/AlterTableDesc.java 
> a767796a949da3c23ebe6d8c78b995c8638ebfef 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/HiveOperation.java 
> cd4c206a89f1bc1a6195b0f1f39d3c4b462dc027 
> 
> 
> Diff: https://reviews.apache.org/r/66979/diff/2/
> 
> 
> Testing
> ---
> 
> Waiting for HiveQA
> 
> - alter_table_set_owner.q which verifies that the new command works. Describe 
> is not tested because the .q tests files mask the owner information.
> - the describe command verified manually in my local hive environment
> 
> 
> Thanks,
> 
> Sergio Pena
> 
>



[jira] [Created] (HIVE-19477) Hiveserver2 in http mode not emitting metric default.General.open_connections

2018-05-09 Thread Jesus Camacho Rodriguez (JIRA)
Jesus Camacho Rodriguez created HIVE-19477:
--

 Summary: Hiveserver2 in http mode not emitting metric 
default.General.open_connections
 Key: HIVE-19477
 URL: https://issues.apache.org/jira/browse/HIVE-19477
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Dinesh Chitlangia
Assignee: Jesus Camacho Rodriguez


Instances in binary mode are emitting the metric 
_default.General.open_connections_ but the instances operating in http mode are 
not emitting this metric.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: May 2018 Hive User Group Meeting

2018-05-09 Thread Luis Figueroa
Hey everyone,

Was the meeting recorded by any chance?

Luis

On May 8, 2018, at 5:31 PM, Sahil Takiar 
mailto:takiar.sa...@gmail.com>> wrote:

Hey Everyone,

Almost time for the meetup! The live stream can be viewed on this link: 
https://live.lifesizecloud.com/extension/2000992219?token=067078ac-a8df-45bc-b84c-4b371ecbc719&name=&locale=en&meeting=Hive%20User%20Group%20Meetup

The stream won't be live until the meetup starts.

For those attending in person, there will be guest wifi:

Login: HiveMeetup
Password: ClouderaHive

On Mon, May 7, 2018 at 12:48 PM, Sahil Takiar 
mailto:takiar.sa...@gmail.com>> wrote:
Hey Everyone,

The meetup is only a day away! 
Here
 is a link to all the abstracts we have compiled thus far. Several of you have 
asked about event streaming and recordings. The meetup will be both streamed 
live and recorded. We will post the links on this thread and on the meetup link 
tomorrow closer to the start of the meetup.

The meetup will be at Cloudera HQ - 395 Page Mill Rd. If you have any trouble 
getting into the building, feel free to post on the meetup link.

Meetup Link: https://www.meetup.com/Hive-User-Group-Meeting/events/249641278/

On Wed, May 2, 2018 at 7:48 AM, Sahil Takiar 
mailto:takiar.sa...@gmail.com>> wrote:
Hey Everyone,

The agenda for the meetup has been set and I'm excited to say we have lots of 
interesting talks scheduled! Below is final agenda, the full list of abstracts 
will be sent out soon. If you are planning to attend, please RSVP on the meetup 
link so we can get an accurate headcount of attendees 
(https://www.meetup.com/Hive-User-Group-Meeting/events/249641278/).

6:30 - 7:00 PM Networking and Refreshments
7:00PM - 8:20 PM Lightning Talks (10 min each) - 8 talks total

  *   What's new in Hive 3.0.0 - Ashutosh Chauhan
  *   Hive-on-Spark at Uber: Efficiency & Scale - Xuefu Zhang
  *   Hive-on-S3 Performance: Past, Present, and Future - Sahil Takiar
  *   Dali: Data Access Layer at LinkedIn - Adwait Tumbde
  *   Parquet Vectorization in Hive - Vihang Karajgaonkar
  *   ORC Column Level Encryption - Owen O’Malley
  *   Running Hive at Scale @ Lyft - Sharanya Santhanam, Rohit Menon
  *   Materialized Views in Hive - Jesus Camacho Rodriguez

8:30 PM - 9:00 PM Hive Metastore Panel

  *   Moderator: Vihang Karajgaonkar
  *   Participants:
 *   Daniel Dai - Hive Metastore Caching
 *   Alan Gates - Hive Metastore Separation
 *   Rituparna Agrawal - Customer Use Cases & Pain Points of (Big) Metadata

The Metastore panel will consist of a short presentation by each panelist 
followed by a Q&A session driven by the moderator.

On Tue, Apr 24, 2018 at 2:53 PM, Sahil Takiar 
mailto:takiar.sa...@gmail.com>> wrote:
We still have a few slots open for lightening talks, so if anyone is interested 
in giving a presentation don't hesitate to reach out!

If you are planning to attend the meetup, please RSVP on the Meetup link 
(https://www.meetup.com/Hive-User-Group-Meeting/events/249641278/) so that we 
can get an accurate headcount for food.

Thanks!

--Sahil

On Wed, Apr 11, 2018 at 5:08 PM, Sahil Takiar 
mailto:takiar.sa...@gmail.com>> wrote:
Hi all,

I'm happy to announce that the Hive community is organizing a Hive user group 
meeting in the Bay Area next month. The details can be found at 
https://www.meetup.com/Hive-User-Group-Meeting/events/249641278/

The format of this meetup will be slightly different from previous ones. There 
will be one hour dedicated to lightning talks, followed by a group discussion 
on the future of the Hive Metastore.

We are inviting talk proposals from Hive users as well as developers at this 
time. Please contact either myself 
(takiar.sa...@gmail.com), Vihang Karajgaonkar 
(vih...@cloudera.com), or Peter Vary 
(pv...@cloudera.com) with proposals. We currently 
have 5 openings.

Please let me know if you have any questions or suggestions.

Thanks,
Sahil



--
Sahil Takiar
Software Engineer
takiar.sa...@gmail.com | (510) 673-0309



--
Sahil Takiar
Software Engineer
takiar.sa...@gmail.com | (510) 673-0309



--
Sahil Takiar
Software Engineer
takiar.sa...@gmail.com | (510) 673-0309



--
Sahil Takiar
Software Engineer
takiar.sa...@gmail.com | (510) 673-0309


Re: Review Request 66800: HIVE-6980 Drop table by using direct sql

2018-05-09 Thread Peter Vary via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66800/
---

(Updated May 9, 2018, 1 p.m.)


Review request for hive, Alexander Kolbasov, Alan Gates, Marta Kuczora, Adam 
Szita, and Vihang Karajgaonkar.


Bugs: HIVE-6980
https://issues.apache.org/jira/browse/HIVE-6980


Repository: hive-git


Description
---

First version of the patch.

Splits getPartitionsViaSqlFilterInternal to:

getPartitionIdsViaSqlFilter - which returns the partition ids
getPartitionsFromPartitionIds - which returns the partition data for the 
partitions
Creates dropPartitionsByPartitionIds which drops the partitions by directSQL 
commands

Creates a dropPartitionsViaSqlFilter using getPartitionIdsViaSqlFilter and 
dropPartitionsByPartitionIds.

Modifies the ObjectStore to drop partitions with directsql if possible.


Diffs (updated)
-

  
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
 4e0e887 
  
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
 6645e55 


Diff: https://reviews.apache.org/r/66800/diff/3/

Changes: https://reviews.apache.org/r/66800/diff/2-3/


Testing
---

Run the TestDropPartition tests, also checked the database manually, that no 
object left in the database


Thanks,

Peter Vary



[GitHub] hive pull request #344: HIVE-19476: Fix failures in TestReplicationScenarios...

2018-05-09 Thread sankarh
GitHub user sankarh opened a pull request:

https://github.com/apache/hive/pull/344

HIVE-19476: Fix failures in TestReplicationScenariosAcidTables, 
TestReplicationOnHDFSEncryptedZones and TestCopyUtils.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sankarh/hive HIVE-19476

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/344.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #344


commit a50f602bb0e38b7b96ae154e667e15afd2fc8b9b
Author: Sankar Hariappan 
Date:   2018-05-09T12:42:21Z

HIVE-19476: Fix failures in TestReplicationScenariosAcidTables, 
TestReplicationOnHDFSEncryptedZones and TestCopyUtils.




---


[jira] [Created] (HIVE-19476) Fix failures in TestReplicationScenariosAcidTables, TestReplicationOnHDFSEncryptedZones and TestCopyUtils

2018-05-09 Thread Sankar Hariappan (JIRA)
Sankar Hariappan created HIVE-19476:
---

 Summary: Fix failures in TestReplicationScenariosAcidTables, 
TestReplicationOnHDFSEncryptedZones and TestCopyUtils
 Key: HIVE-19476
 URL: https://issues.apache.org/jira/browse/HIVE-19476
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, repl
Affects Versions: 3.0.0
Reporter: Sankar Hariappan
Assignee: Sankar Hariappan
 Fix For: 3.0.0, 3.1.0


If the incremental dump have drop of partitioned table followed by 
create/insert on non-partitioned table with same name, doesn't replicate the 
data. Explained below.

Let's say we have a partitioned table T1 which was already replicated to target.

DROP_TABLE(T1)->CREATE_TABLE(T1) (Non-partitioned) -> INSERT(T1)(10) 

After REPL LOAD, T1 doesn't have any data.

Same is valid for non-partitioned to partitioned and partition spec mismatch 
case as well.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19475) Issue when streaming data to Azure Data Lake Store

2018-05-09 Thread Thomas Nys (JIRA)
Thomas Nys created HIVE-19475:
-

 Summary: Issue when streaming data to Azure Data Lake Store
 Key: HIVE-19475
 URL: https://issues.apache.org/jira/browse/HIVE-19475
 Project: Hive
  Issue Type: Bug
  Components: Streaming
Affects Versions: 2.2.0
 Environment: HDInsight 3.6 on Ubuntu 16.04.4 LTS (GNU/Linux 
4.13.0-1012-azure x86_64)

Used java libraries:
{code:java}
libraryDependencies += "org.apache.hive.hcatalog" % "hive-hcatalog-streaming" % 
"2.2.0"
libraryDependencies += "org.apache.hive.hcatalog" % "hive-hcatalog-core" % 
"2.2.0"
libraryDependencies += "org.apache.hadoop" % "hadoop-client" % "2.8.0"
{code}
Please let me know if more details are needed.
Reporter: Thomas Nys


I am trying to stream data from a Java (Play2 api) to  HDInsight Hive 
interactive query with Azure Data Lake Store as storage back-end. The following 
code is ran on one of the head nodes of the cluster.

When fetching a transaction-batch:
{code:java}
TransactionBatch txnBatch = this.connection.fetchTransactionBatch(10, 
(RecordWriter)writer);
{code}
I receive the following error:
{code:java}
play.api.UnexpectedException: Unexpected exception[StreamingIOFailure: Failed 
creating RecordUpdaterS for 
adl://home/hive/warehouse/raw_telemetry_data/ingest_date=2018-05-07 
txnIds[506,515]]
    at 
play.api.http.HttpErrorHandlerExceptions$.throwableToUsefulException(HttpErrorHandler.scala:251)
    at 
play.api.http.DefaultHttpErrorHandler.onServerError(HttpErrorHandler.scala:182)
    at 
play.core.server.AkkaHttpServer$$anonfun$2.applyOrElse(AkkaHttpServer.scala:343)
    at 
play.core.server.AkkaHttpServer$$anonfun$2.applyOrElse(AkkaHttpServer.scala:341)
    at scala.concurrent.Future.$anonfun$recoverWith$1(Future.scala:414)
    at scala.concurrent.impl.Promise.$anonfun$transformWith$1(Promise.scala:37)
    at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60)
    at 
akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
    at 
akka.dispatch.BatchingExecutor$BlockableBatch.$anonfun$run$1(BatchingExecutor.scala:91)
    at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
Caused by: org.apache.hive.hcatalog.streaming.StreamingIOFailure: Failed 
creating RecordUpdaterS for 
adl://home/hive/warehouse/raw_telemetry_data/ingest_date=2018-05-07 
txnIds[506,515]
    at 
org.apache.hive.hcatalog.streaming.AbstractRecordWriter.newBatch(AbstractRecordWriter.java:208)
    at 
org.apache.hive.hcatalog.streaming.HiveEndPoint$TransactionBatchImpl.(HiveEndPoint.java:608)
    at 
org.apache.hive.hcatalog.streaming.HiveEndPoint$TransactionBatchImpl.(HiveEndPoint.java:556)
    at 
org.apache.hive.hcatalog.streaming.HiveEndPoint$ConnectionImpl.fetchTransactionBatchImpl(HiveEndPoint.java:442)
    at 
org.apache.hive.hcatalog.streaming.HiveEndPoint$ConnectionImpl.fetchTransactionBatch(HiveEndPoint.java:422)
    at hive.HiveRepository.createMany(HiveRepository.java:76)
    at controllers.HiveController.create(HiveController.java:40)
    at router.Routes$$anonfun$routes$1.$anonfun$applyOrElse$2(Routes.scala:70)
    at 
play.core.routing.HandlerInvokerFactory$$anon$4.resultCall(HandlerInvoker.scala:137)
    at 
play.core.routing.HandlerInvokerFactory$JavaActionInvokerFactory$$anon$8$$anon$2$$anon$1.invocation(HandlerInvoker.scala:108)
Caused by: java.io.IOException: No FileSystem for scheme: adl
    at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2798)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2809)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:100)
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2848)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2830)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:389)
    at org.apache.hadoop.fs.Path.getFileSystem(Path.java:356)
    at 
org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater.(OrcRecordUpdater.java:187)
    at 
org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat.getRecordUpdater(OrcOutputFormat.java:278)
    at 
org.apache.hive.hcatalog.streaming.AbstractRecordWriter.createRecordUpdater(AbstractRecordWriter.java:268){code}
 

Any help would be greatly appreciated.

 

 
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)