[jira] [Created] (HIVE-14056) Golden file updates for few tests

2016-06-17 Thread Ashutosh Chauhan (JIRA)
Ashutosh Chauhan created HIVE-14056:
---

 Summary: Golden file updates for few tests
 Key: HIVE-14056
 URL: https://issues.apache.org/jira/browse/HIVE-14056
 Project: Hive
  Issue Type: Task
  Components: Tests
Affects Versions: 2.2.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14055) directSql - getting the number of partitions is broken

2016-06-17 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-14055:
---

 Summary: directSql - getting the number of partitions is broken
 Key: HIVE-14055
 URL: https://issues.apache.org/jira/browse/HIVE-14055
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin


Noticed while looking at something else



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14054) TestHiveMetaStoreChecker fails on master

2016-06-17 Thread Ashutosh Chauhan (JIRA)
Ashutosh Chauhan created HIVE-14054:
---

 Summary: TestHiveMetaStoreChecker fails on master 
 Key: HIVE-14054
 URL: https://issues.apache.org/jira/browse/HIVE-14054
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 2.2.0
Reporter: Ashutosh Chauhan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 40867: HIVE-11527 - bypass HiveServer2 thrift interface for query results

2016-06-17 Thread Thejas Nair

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40867/#review138347
---




jdbc/src/java/org/apache/hive/jdbc/HiveQueryResultSet.java (line 57)


Use ReflectionUtils that exists within Hive ?
The hadoop one is not a public class.
I know that we use the hadoop one in other parts, but that is something we 
should move away from.



jdbc/src/java/org/apache/hive/jdbc/HiveQueryResultSet.java (line 535)


Wouldn't it be more performant to use LazyBinarySimpleSerde ?



ql/src/java/org/apache/hadoop/hive/ql/Driver.java (line 1972)


We should log if the bypass is kicking in or not , and the reason for 
disabling it.
There are so many cases where it gets disabled, it would be hard to debug 
in a production environment what the reason is.



service-rpc/if/TCLIService.thrift (line 277)


There is already typeDesc, why do we need typeName


- Thejas Nair


On June 15, 2016, 6:50 a.m., Takanobu Asanuma wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/40867/
> ---
> 
> (Updated June 15, 2016, 6:50 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> This is a WIP patch for HIVE-11527
> 
> * I added a new configuration whose name is 
> hive.server2.webhdfs.bypass.enabled. The default is false. When this value is 
> true, clients use the bypass.
> 
> * I still have not considered security such as Kerberos and SSL at present.
> 
> * I have not implement Statement#setFetchSize for bypass yet.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 761dbb2 
>   itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestJdbcWithMiniHA.java 
> 84644d1 
>   
> itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestJdbcWithMiniHS2.java 
> 0c313a2 
>   itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestJdbcWithMiniMr.java 
> 637e51a 
>   jdbc/src/java/org/apache/hive/jdbc/HiveQueryResultSet.java 92fdbca 
>   jdbc/src/java/org/apache/hive/jdbc/HiveStatement.java a242501 
>   ql/src/java/org/apache/hadoop/hive/ql/Driver.java 2263192 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java dff1815 
>   service-rpc/if/TCLIService.thrift 5a9a785 
>   service-rpc/src/gen/thrift/gen-cpp/TCLIService_types.h d23b3cd 
>   service-rpc/src/gen/thrift/gen-cpp/TCLIService_types.cpp 0f53cb2 
>   
> service-rpc/src/gen/thrift/gen-javabean/org/apache/hive/service/rpc/thrift/TColumnDesc.java
>  31472c8 
>   
> service-rpc/src/gen/thrift/gen-javabean/org/apache/hive/service/rpc/thrift/TExecuteStatementResp.java
>  7101fa5 
>   
> service-rpc/src/gen/thrift/gen-javabean/org/apache/hive/service/rpc/thrift/TGetTablesReq.java
>  1aa3f94 
>   
> service-rpc/src/gen/thrift/gen-javabean/org/apache/hive/service/rpc/thrift/TProtocolVersion.java
>  14d50ed 
>   service-rpc/src/gen/thrift/gen-php/Types.php a6a257f 
>   service-rpc/src/gen/thrift/gen-py/TCLIService/ttypes.py fcd330f 
>   service-rpc/src/gen/thrift/gen-rb/t_c_l_i_service_types.rb 71148a0 
>   service/src/java/org/apache/hive/service/cli/CLIService.java ed52b4a 
>   service/src/java/org/apache/hive/service/cli/ColumnDescriptor.java bfd7135 
>   service/src/java/org/apache/hive/service/cli/operation/Operation.java 
> d48b92c 
>   
> service/src/java/org/apache/hive/service/cli/operation/OperationManager.java 
> 2f18231 
>   service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java 
> 3bf40eb 
>   service/src/java/org/apache/hive/service/cli/session/HiveSession.java 
> 78ff388 
>   service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java 
> 7341635 
>   service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java 
> 8bc3d94 
> 
> Diff: https://reviews.apache.org/r/40867/diff/
> 
> 
> Testing
> ---
> 
> I have tested few simple queries and they worked well. But I think there are 
> some problems for some queries. I'm going to test more queries and fix bugs. 
> I'm also going to add unit tests.
> 
> 
> Thanks,
> 
> Takanobu Asanuma
> 
>



Re: [VOTE] Apache Hive 2.1.0 Release Candidate 3

2016-06-17 Thread Jason Dere
Checked signatures, ran build and a few tests.
+1

From: Sushanth Sowmyan 
Sent: Friday, June 17, 2016 3:30 PM
To: dev@hive.apache.org
Subject: Re: [VOTE] Apache Hive 2.1.0 Release Candidate 3

Actually, to be more explicit, per Thejas' case of the top level
license taking precedence, this RC has my +1.

On Fri, Jun 17, 2016 at 3:28 PM, Sushanth Sowmyan  wrote:
> I will happily rescind my -1 and even convert it to a +1 if the top
> level license does hold. I thought that the RAT check was a necessary
> blocker.
>
> (Although, if the top level license does cover across the board, we
> may want to open a new discussion on whether having a license
> requirement for every source file is necessary in the first place, and
> tweak the definition of the rat check so it does not fail it in this
> case.)
>
> On Fri, Jun 17, 2016 at 3:20 PM, Thejas Nair  wrote:
>> I don't think the missing headers for 2 files mandates a respin of
>> this RC .  It is not really a case of 'incompatible' license or code
>> that shouldn't be shipped.
>> We have a top level license file that covers the entire project,
>> including these files.
>> IMO, We should fix it if there is a new RC for some other reason. But
>> this alone doesn't seem to make new RC necessary.
>>
>> Sushanth, Can you please reconsider your -1 ?
>>
>>
>> On Fri, Jun 17, 2016 at 3:06 PM, Sushanth Sowmyan  wrote:
>>> -1, terribly sorry I didn't check for this earlier, but the RAT check
>>> fails for this.
>>>
>>> If you run mvn apache-rat:check , then you see the following issue:
>>>
>>> Unapproved licenses:
>>>
>>>   
>>> /Users/sush/t/rel/apache-hive-2.1.0-src/common/src/java/org/apache/hive/common/util/DateParser.java
>>>   
>>> /Users/sush/t/rel/apache-hive-2.1.0-src/common/src/test/org/apache/hive/common/util/TestDateParser.java
>>>
>>> Basically, these two files are missing the apache license header. We
>>> need to add them in.
>>>
>>> All other things are good, though. It has the oracle fix I asked for
>>> in RC2, md5s and signatures check out, compilation works on source
>>> package, and I'm able to run the hive binary from the binary package.
>>> I also tried a number of tests, and I've run a rat test on the release
>>>
>>> On Thu, Jun 16, 2016 at 6:02 PM, Jesus Camacho Rodriguez
>>>  wrote:
 Apache Hive 2.1.0 Release Candidate 3 is available here:

 http://people.apache.org/~jcamacho/hive-2.1.0-rc3

 Maven artifacts are available here:

 https://repository.apache.org/content/repositories/orgapachehive-1057/

 Source tag for RC3 is at:
 https://github.com/apache/hive/releases/tag/release-2.1.0-rc3


 Voting will conclude in 72 hours.

 Hive PMC Members: Please test and vote.

 Thanks.






[jira] [Created] (HIVE-14053) Hive should report that primary keys can't be null.

2016-06-17 Thread Carter Shanklin (JIRA)
Carter Shanklin created HIVE-14053:
--

 Summary: Hive should report that primary keys can't be null.
 Key: HIVE-14053
 URL: https://issues.apache.org/jira/browse/HIVE-14053
 Project: Hive
  Issue Type: Bug
Reporter: Carter Shanklin


HIVE-13076 introduces "rely novalidate" primary and foreign keys to Hive. With 
the right driver in place, tools like Tableau can do join elimination and 
queries can run much faster.

Some gaps remain, currently getAttributes() in HiveDatabaseMetaData doesn't 
work quite right for keys. In particular, primary keys by definition are not 
null and the metadata should reflect this for improved join elimination.

In this example that uses the TPC-H schema and its constraints, we sum 
l_extendedprice and group by l_shipmode. This query should not use more than 
just the lineitem table.

With all the constraints in place, Tableau generates this query:
{code}
SELECT `lineitem`.`l_shipmode` AS `l_shipmode`,
  SUM(`lineitem`.`l_extendedprice`) AS `sum_l_extendedprice_ok`
FROM `tpch_bin_flat_orc_2`.`lineitem` `lineitem`
  JOIN `tpch_bin_flat_orc_2`.`orders` `orders` ON (`lineitem`.`l_orderkey` = 
`orders`.`o_orderkey`)
  JOIN `tpch_bin_flat_orc_2`.`customer` `customer` ON (`orders`.`o_custkey` = 
`customer`.`c_custkey`)
  JOIN `tpch_bin_flat_orc_2`.`nation` `nation` ON (`customer`.`c_nationkey` = 
`nation`.`n_nationkey`)
WHERE NOT (`lineitem`.`l_partkey` IS NULL)) AND (NOT 
(`lineitem`.`l_suppkey` IS NULL))) AND ((NOT (`lineitem`.`l_partkey` IS NULL)) 
AND (NOT (`lineitem`.`l_suppkey` IS NULL AND (NOT (`nation`.`n_regionkey` 
IS NULL)))
{code}

Since these are the primary keys the denormalization and the where condition is 
unnecessary and this sort of query can be a lot faster by just accessing the 
lineitem table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 48233: HIVE-13884: Disallow queries fetching more than a configured number of partitions in PartitionPruner

2016-06-17 Thread Sergio Pena


> On June 17, 2016, 10:20 p.m., Szehon Ho wrote:
> > metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java, 
> > line 3179
> > 
> >
> > I actually meant here to get rid of these checks as well (in the two 
> > checkLimitNumberOfPartitionByX methods)

I did that before, but I prefered to not do the extra call to 
'get_num_partitions_by_filter' and 'get_num_partitions_by_expr' if partition 
limit is not enabled. Any idea how to avoid it?


- Sergio


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48233/#review138113
---


On June 17, 2016, 3:18 p.m., Sergio Pena wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/48233/
> ---
> 
> (Updated June 17, 2016, 3:18 p.m.)
> 
> 
> Review request for hive, Mohit Sabharwal and Naveen Gangam.
> 
> 
> Bugs: HIVE-13884
> https://issues.apache.org/jira/browse/HIVE-13884
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> The patch verifies the # of partitions a table has before fetching any from 
> the metastore. I
> t checks that limit from 'hive.limit.query.max.table.partition'.
> 
> A limitation added here is that the variable must be on hive-site.xml in 
> order to work, and it does not accept to set this through beeline because 
> HiveMetaStore.java does not read the variables set through beeline. I think 
> it is better to keep it this way to avoid users changing the value on fly, 
> and crashing the metastore.
> 
> Another change is that EXPLAIN commands won't be executed either. EXPLAIN 
> commands need to fetch partitions in order to create the operator tree. If we 
> allow EXPLAIN to do that, then we may have the same OOM situations for large 
> partitions.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
> cc950089cf52a0344e1be0c42309d521fb8cb4d6 
>   metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
> c0827ea9d47e569d9697649a7e16d196de3de14d 
>   metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java 
> c135179b97354108f842a5ca2de0c6f0ef28b7fc 
>   metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 
> da188d33d6194740ba9ecb37a6e533ecf1ec6906 
>   metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 
> a6d3f5385b33b8a4e31ee20ca5cb8f58c97c8702 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/HBaseStore.java 
> 31f0d7b89670b8a749bbe8a7ff2b4ff9f059a8e2 
>   
> metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java
>  3152e77c3c7152ac4dbe7e779ce35f28044fe3c9 
>   
> metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java
>  86a243609b23e2ca9bb8849f0da863a95e477d5c 
> 
> Diff: https://reviews.apache.org/r/48233/diff/
> 
> 
> Testing
> ---
> 
> Waiting for HiveQA.
> 
> 
> Thanks,
> 
> Sergio Pena
> 
>



Review Request 48886: HIVE-14052: Cleanup of structures required when LLAP access from external clients completes

2016-06-17 Thread Jason Dere

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48886/
---

Review request for hive and Siddharth Seth.


Bugs: HIVE-14052
https://issues.apache.org/jira/browse/HIVE-14052


Repository: hive-git


Description
---

Add a hook to call run QueryTracker.queryComplete if there are no more 
fragments for this query.
This cleanup runs on delay and can be cancelled if another fragment request 
comes in with the same query ID.


Diffs
-

  
llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/ContainerRunnerImpl.java
 ded84c1 
  llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/LlapDaemon.java 
c7e9d32 
  
llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/QueryTracker.java 
a965872 
  ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java 825488f 
  ql/src/test/org/apache/hadoop/hive/llap/TestLlapOutputFormat.java 2288cd4 

Diff: https://reviews.apache.org/r/48886/diff/


Testing
---


Thanks,

Jason Dere



Re: [VOTE] Apache Hive 2.1.0 Release Candidate 3

2016-06-17 Thread Sushanth Sowmyan
Actually, to be more explicit, per Thejas' case of the top level
license taking precedence, this RC has my +1.

On Fri, Jun 17, 2016 at 3:28 PM, Sushanth Sowmyan  wrote:
> I will happily rescind my -1 and even convert it to a +1 if the top
> level license does hold. I thought that the RAT check was a necessary
> blocker.
>
> (Although, if the top level license does cover across the board, we
> may want to open a new discussion on whether having a license
> requirement for every source file is necessary in the first place, and
> tweak the definition of the rat check so it does not fail it in this
> case.)
>
> On Fri, Jun 17, 2016 at 3:20 PM, Thejas Nair  wrote:
>> I don't think the missing headers for 2 files mandates a respin of
>> this RC .  It is not really a case of 'incompatible' license or code
>> that shouldn't be shipped.
>> We have a top level license file that covers the entire project,
>> including these files.
>> IMO, We should fix it if there is a new RC for some other reason. But
>> this alone doesn't seem to make new RC necessary.
>>
>> Sushanth, Can you please reconsider your -1 ?
>>
>>
>> On Fri, Jun 17, 2016 at 3:06 PM, Sushanth Sowmyan  wrote:
>>> -1, terribly sorry I didn't check for this earlier, but the RAT check
>>> fails for this.
>>>
>>> If you run mvn apache-rat:check , then you see the following issue:
>>>
>>> Unapproved licenses:
>>>
>>>   
>>> /Users/sush/t/rel/apache-hive-2.1.0-src/common/src/java/org/apache/hive/common/util/DateParser.java
>>>   
>>> /Users/sush/t/rel/apache-hive-2.1.0-src/common/src/test/org/apache/hive/common/util/TestDateParser.java
>>>
>>> Basically, these two files are missing the apache license header. We
>>> need to add them in.
>>>
>>> All other things are good, though. It has the oracle fix I asked for
>>> in RC2, md5s and signatures check out, compilation works on source
>>> package, and I'm able to run the hive binary from the binary package.
>>> I also tried a number of tests, and I've run a rat test on the release
>>>
>>> On Thu, Jun 16, 2016 at 6:02 PM, Jesus Camacho Rodriguez
>>>  wrote:
 Apache Hive 2.1.0 Release Candidate 3 is available here:

 http://people.apache.org/~jcamacho/hive-2.1.0-rc3

 Maven artifacts are available here:

 https://repository.apache.org/content/repositories/orgapachehive-1057/

 Source tag for RC3 is at:
 https://github.com/apache/hive/releases/tag/release-2.1.0-rc3


 Voting will conclude in 72 hours.

 Hive PMC Members: Please test and vote.

 Thanks.






Re: [VOTE] Apache Hive 2.1.0 Release Candidate 3

2016-06-17 Thread Sushanth Sowmyan
I will happily rescind my -1 and even convert it to a +1 if the top
level license does hold. I thought that the RAT check was a necessary
blocker.

(Although, if the top level license does cover across the board, we
may want to open a new discussion on whether having a license
requirement for every source file is necessary in the first place, and
tweak the definition of the rat check so it does not fail it in this
case.)

On Fri, Jun 17, 2016 at 3:20 PM, Thejas Nair  wrote:
> I don't think the missing headers for 2 files mandates a respin of
> this RC .  It is not really a case of 'incompatible' license or code
> that shouldn't be shipped.
> We have a top level license file that covers the entire project,
> including these files.
> IMO, We should fix it if there is a new RC for some other reason. But
> this alone doesn't seem to make new RC necessary.
>
> Sushanth, Can you please reconsider your -1 ?
>
>
> On Fri, Jun 17, 2016 at 3:06 PM, Sushanth Sowmyan  wrote:
>> -1, terribly sorry I didn't check for this earlier, but the RAT check
>> fails for this.
>>
>> If you run mvn apache-rat:check , then you see the following issue:
>>
>> Unapproved licenses:
>>
>>   
>> /Users/sush/t/rel/apache-hive-2.1.0-src/common/src/java/org/apache/hive/common/util/DateParser.java
>>   
>> /Users/sush/t/rel/apache-hive-2.1.0-src/common/src/test/org/apache/hive/common/util/TestDateParser.java
>>
>> Basically, these two files are missing the apache license header. We
>> need to add them in.
>>
>> All other things are good, though. It has the oracle fix I asked for
>> in RC2, md5s and signatures check out, compilation works on source
>> package, and I'm able to run the hive binary from the binary package.
>> I also tried a number of tests, and I've run a rat test on the release
>>
>> On Thu, Jun 16, 2016 at 6:02 PM, Jesus Camacho Rodriguez
>>  wrote:
>>> Apache Hive 2.1.0 Release Candidate 3 is available here:
>>>
>>> http://people.apache.org/~jcamacho/hive-2.1.0-rc3
>>>
>>> Maven artifacts are available here:
>>>
>>> https://repository.apache.org/content/repositories/orgapachehive-1057/
>>>
>>> Source tag for RC3 is at:
>>> https://github.com/apache/hive/releases/tag/release-2.1.0-rc3
>>>
>>>
>>> Voting will conclude in 72 hours.
>>>
>>> Hive PMC Members: Please test and vote.
>>>
>>> Thanks.
>>>
>>>
>>>
>>>


[jira] [Created] (HIVE-14052) Cleanup of structures required when LLAP access from external clients completes

2016-06-17 Thread Jason Dere (JIRA)
Jason Dere created HIVE-14052:
-

 Summary: Cleanup of structures required when LLAP access from 
external clients completes
 Key: HIVE-14052
 URL: https://issues.apache.org/jira/browse/HIVE-14052
 Project: Hive
  Issue Type: Bug
  Components: llap
Reporter: Jason Dere
Assignee: Jason Dere


Per [~sseth]: There's no cleanup at the moment, and structures used in LLAP to 
track a query will keep building up slowly over time.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 48233: HIVE-13884: Disallow queries fetching more than a configured number of partitions in PartitionPruner

2016-06-17 Thread Szehon Ho

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48233/#review138113
---



Looks good to me.  Just a follow up on the previous comment, can you now remove 
those two methods before commit?


common/src/java/org/apache/hadoop/hive/conf/HiveConf.java (line 780)






metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java (line 
3179)


I actually meant here to get rid of these checks as well (in the two 
checkLimitNumberOfPartitionByX methods)


- Szehon Ho


On June 17, 2016, 3:18 p.m., Sergio Pena wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/48233/
> ---
> 
> (Updated June 17, 2016, 3:18 p.m.)
> 
> 
> Review request for hive, Mohit Sabharwal and Naveen Gangam.
> 
> 
> Bugs: HIVE-13884
> https://issues.apache.org/jira/browse/HIVE-13884
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> The patch verifies the # of partitions a table has before fetching any from 
> the metastore. I
> t checks that limit from 'hive.limit.query.max.table.partition'.
> 
> A limitation added here is that the variable must be on hive-site.xml in 
> order to work, and it does not accept to set this through beeline because 
> HiveMetaStore.java does not read the variables set through beeline. I think 
> it is better to keep it this way to avoid users changing the value on fly, 
> and crashing the metastore.
> 
> Another change is that EXPLAIN commands won't be executed either. EXPLAIN 
> commands need to fetch partitions in order to create the operator tree. If we 
> allow EXPLAIN to do that, then we may have the same OOM situations for large 
> partitions.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
> cc950089cf52a0344e1be0c42309d521fb8cb4d6 
>   metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
> c0827ea9d47e569d9697649a7e16d196de3de14d 
>   metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java 
> c135179b97354108f842a5ca2de0c6f0ef28b7fc 
>   metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 
> da188d33d6194740ba9ecb37a6e533ecf1ec6906 
>   metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 
> a6d3f5385b33b8a4e31ee20ca5cb8f58c97c8702 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/HBaseStore.java 
> 31f0d7b89670b8a749bbe8a7ff2b4ff9f059a8e2 
>   
> metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java
>  3152e77c3c7152ac4dbe7e779ce35f28044fe3c9 
>   
> metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java
>  86a243609b23e2ca9bb8849f0da863a95e477d5c 
> 
> Diff: https://reviews.apache.org/r/48233/diff/
> 
> 
> Testing
> ---
> 
> Waiting for HiveQA.
> 
> 
> Thanks,
> 
> Sergio Pena
> 
>



Re: [VOTE] Apache Hive 2.1.0 Release Candidate 3

2016-06-17 Thread Thejas Nair
I don't think the missing headers for 2 files mandates a respin of
this RC .  It is not really a case of 'incompatible' license or code
that shouldn't be shipped.
We have a top level license file that covers the entire project,
including these files.
IMO, We should fix it if there is a new RC for some other reason. But
this alone doesn't seem to make new RC necessary.

Sushanth, Can you please reconsider your -1 ?


On Fri, Jun 17, 2016 at 3:06 PM, Sushanth Sowmyan  wrote:
> -1, terribly sorry I didn't check for this earlier, but the RAT check
> fails for this.
>
> If you run mvn apache-rat:check , then you see the following issue:
>
> Unapproved licenses:
>
>   
> /Users/sush/t/rel/apache-hive-2.1.0-src/common/src/java/org/apache/hive/common/util/DateParser.java
>   
> /Users/sush/t/rel/apache-hive-2.1.0-src/common/src/test/org/apache/hive/common/util/TestDateParser.java
>
> Basically, these two files are missing the apache license header. We
> need to add them in.
>
> All other things are good, though. It has the oracle fix I asked for
> in RC2, md5s and signatures check out, compilation works on source
> package, and I'm able to run the hive binary from the binary package.
> I also tried a number of tests, and I've run a rat test on the release
>
> On Thu, Jun 16, 2016 at 6:02 PM, Jesus Camacho Rodriguez
>  wrote:
>> Apache Hive 2.1.0 Release Candidate 3 is available here:
>>
>> http://people.apache.org/~jcamacho/hive-2.1.0-rc3
>>
>> Maven artifacts are available here:
>>
>> https://repository.apache.org/content/repositories/orgapachehive-1057/
>>
>> Source tag for RC3 is at:
>> https://github.com/apache/hive/releases/tag/release-2.1.0-rc3
>>
>>
>> Voting will conclude in 72 hours.
>>
>> Hive PMC Members: Please test and vote.
>>
>> Thanks.
>>
>>
>>
>>


Re: [VOTE] Apache Hive 2.1.0 Release Candidate 3

2016-06-17 Thread Sushanth Sowmyan
-1, terribly sorry I didn't check for this earlier, but the RAT check
fails for this.

If you run mvn apache-rat:check , then you see the following issue:

Unapproved licenses:

  
/Users/sush/t/rel/apache-hive-2.1.0-src/common/src/java/org/apache/hive/common/util/DateParser.java
  
/Users/sush/t/rel/apache-hive-2.1.0-src/common/src/test/org/apache/hive/common/util/TestDateParser.java

Basically, these two files are missing the apache license header. We
need to add them in.

All other things are good, though. It has the oracle fix I asked for
in RC2, md5s and signatures check out, compilation works on source
package, and I'm able to run the hive binary from the binary package.
I also tried a number of tests, and I've run a rat test on the release

On Thu, Jun 16, 2016 at 6:02 PM, Jesus Camacho Rodriguez
 wrote:
> Apache Hive 2.1.0 Release Candidate 3 is available here:
>
> http://people.apache.org/~jcamacho/hive-2.1.0-rc3
>
> Maven artifacts are available here:
>
> https://repository.apache.org/content/repositories/orgapachehive-1057/
>
> Source tag for RC3 is at:
> https://github.com/apache/hive/releases/tag/release-2.1.0-rc3
>
>
> Voting will conclude in 72 hours.
>
> Hive PMC Members: Please test and vote.
>
> Thanks.
>
>
>
>


Re: [VOTE] Apache Hive 2.1.0 Release Candidate 3

2016-06-17 Thread Gunther Hagleitner
+1

- verified signature and checksum for src and bin
- compiled from source
- ran a number of tests
- verified structure/contents of the binary/src package

Thanks,
Gunther.

From: Prasanth Jayachandran 
Sent: Friday, June 17, 2016 2:40 PM
To: dev@hive.apache.org
Subject: Re: [VOTE] Apache Hive 2.1.0 Release Candidate 3

+1
- verified signature and checksum for binary and source
- compiled source and ran some unit tests
- ran hive cli from binary and ran some queries using tez

Thanks
Prasanth

> On Jun 16, 2016, at 6:02 PM, Jesus Camacho Rodriguez 
>  wrote:
>
> Apache Hive 2.1.0 Release Candidate 3 is available here:
>
> http://people.apache.org/~jcamacho/hive-2.1.0-rc3
>
> Maven artifacts are available here:
>
> https://repository.apache.org/content/repositories/orgapachehive-1057/
>
> Source tag for RC3 is at:
> https://github.com/apache/hive/releases/tag/release-2.1.0-rc3
>
>
> Voting will conclude in 72 hours.
>
> Hive PMC Members: Please test and vote.
>
> Thanks.
>
>
>
>




Re: [VOTE] Apache Hive 2.1.0 Release Candidate 3

2016-06-17 Thread Prasanth Jayachandran
+1
- verified signature and checksum for binary and source
- compiled source and ran some unit tests
- ran hive cli from binary and ran some queries using tez

Thanks
Prasanth

> On Jun 16, 2016, at 6:02 PM, Jesus Camacho Rodriguez 
>  wrote:
> 
> Apache Hive 2.1.0 Release Candidate 3 is available here:
> 
> http://people.apache.org/~jcamacho/hive-2.1.0-rc3
> 
> Maven artifacts are available here:
> 
> https://repository.apache.org/content/repositories/orgapachehive-1057/
> 
> Source tag for RC3 is at:
> https://github.com/apache/hive/releases/tag/release-2.1.0-rc3
> 
> 
> Voting will conclude in 72 hours.
> 
> Hive PMC Members: Please test and vote.
> 
> Thanks.
> 
> 
> 
> 



[jira] [Created] (HIVE-14051) Custom authenticaton in Hive JDBC

2016-06-17 Thread Vinoth Sathappan (JIRA)
Vinoth Sathappan created HIVE-14051:
---

 Summary: Custom authenticaton in Hive JDBC
 Key: HIVE-14051
 URL: https://issues.apache.org/jira/browse/HIVE-14051
 Project: Hive
  Issue Type: Improvement
  Components: JDBC
Reporter: Vinoth Sathappan
Assignee: Vinoth Sathappan


Enable the JDBC driver for Hive to use a pluggable module to connect to HS2 
behind gateways using OAuth, OpenID Connect, etc. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14050) Hive attempts to 'chgrp' files on s3a://

2016-06-17 Thread Sean Roberts (JIRA)
Sean Roberts created HIVE-14050:
---

 Summary: Hive attempts to 'chgrp' files on s3a://
 Key: HIVE-14050
 URL: https://issues.apache.org/jira/browse/HIVE-14050
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.1
Reporter: Sean Roberts
Assignee: Chris Nauroth


When inserting to a table on s3a://, Hive attempts to `chgrp` the files but 
files in s3a:// do not have group ownership.

{code}
hive> insert into INVENTORY select * from INVENTORY_Q1_2006;
-chgrp: '' does not match expected pattern for group
Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
{code}

Full output of the query here:
{code}
hive> insert into INVENTORY select * from INVENTORY_Q1_2006;
-chgrp: '' does not match expected pattern for group
Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
Query ID = admin_20160617201151_5f953fbe-acde-4774-9ad7-06cffc76dd72
Total jobs = 1
Launching Job 1 out of 1

Status: Running (Executing on YARN cluster with App id 
application_1466165341299_0011)


VERTICES  STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED

Map 1 ..   SUCCEEDED  1  100   0   0

VERTICES: 01/01  [==>>] 100%  ELAPSED TIME: 8.71 s

Loading data to table mydb.inventory
-chgrp: '' does not match expected pattern for group
Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
Table mydb.inventory stats: [numFiles=12, numRows=6020352, totalSize=25250706, 
rawDataSize=96325632]
OK
Time taken: 19.123 seconds
{code}

The table:
{code}
CREATE TABLE IF NOT EXISTS inventory
   (
MONTH_ID int,
ITEM_ID int,
BOH_QTY float,
EOH_QTY float
   ) row format delimited fields terminated by '|' escaped by '\\' stored as ORC
LOCATION 's3a://mybucket/hive/warehouse/mydb.db/inventory'
tblproperties ("orc.compress"="SNAPPY");
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 48159: HIVE-13901: Hivemetastore add partitions can be slow depending on filesystems

2016-06-17 Thread Sergey Shelukhin


> On June 14, 2016, 6:31 p.m., Sergey Shelukhin wrote:
> > metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java, 
> > line 365
> > 
> >
> > two nested synchornized-s on the same thing.
> > Also nit: is it possible to have an object to sync on, rather than 
> > syncing on something with global visiblity
> 
> Rajesh Balamohan wrote:
> removed a nesting. Retaining sync on HMSHandler.class, as threadPool 
> itself is static.

I was more concerned with visibility, rather than static. I.e. when SomeClass 
does synchronized (this) it can backfire if someone decides to save on Object-s 
and instead does SomeClass foo; synchronized (foo). I guess with .class it's 
much less likely, so it should be ok


- Sergey


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48159/#review137561
---


On June 17, 2016, 5:56 a.m., Rajesh Balamohan wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/48159/
> ---
> 
> (Updated June 17, 2016, 5:56 a.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Bugs: HIVE-13901
> https://issues.apache.org/jira/browse/HIVE-13901
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Depending on FS, creating external tables & adding partitions can be 
> expensive (e.g msck which adds all partitions).
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java cc95008 
>   metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
> c0827ea 
> 
> Diff: https://reviews.apache.org/r/48159/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Rajesh Balamohan
> 
>



[jira] [Created] (HIVE-14049) Password prompt in Beeline is continuously printed

2016-06-17 Thread Abdullah Yousufi (JIRA)
Abdullah Yousufi created HIVE-14049:
---

 Summary: Password prompt in Beeline is continuously printed
 Key: HIVE-14049
 URL: https://issues.apache.org/jira/browse/HIVE-14049
 Project: Hive
  Issue Type: Bug
  Components: Beeline
Affects Versions: 2.0.1
Reporter: Abdullah Yousufi


I'm experiencing this issue with a Mac, which was not occurring until recently.

{code}
Beeline version 2.2.0-SNAPSHOT by Apache Hive
beeline> !connect jdbc:hive2://localhost:1
Connecting to jdbc:hive2://localhost:1
Enter username for jdbc:hive2://localhost:1: hive
Enter password for jdbc:hive2://localhost:1:
Enter password for jdbc:hive2://localhost:1:
Enter password for jdbc:hive2://localhost:1:
...
{code}

The 'Enter password for jdbc:hive2://localhost:1:' line continues to print 
until enter is hit. From looking at the code in Commands.java (lines 
1413-1420), it's not quite clear why this happens on the second call to 
readLine()) :
{code}
if (username == null) {
  username = beeLine.getConsoleReader().readLine("Enter username for " + 
url + ": ");
}
props.setProperty("user", username);
if (password == null) {
  password = beeLine.getConsoleReader().readLine("Enter password for " + 
url + ": ",
  new Character('*'));
}
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14048) patch for HIVE-4570 removes protected fields which can break dependencies

2016-06-17 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created HIVE-14048:
--

 Summary: patch for HIVE-4570 removes protected fields which can 
break dependencies
 Key: HIVE-14048
 URL: https://issues.apache.org/jira/browse/HIVE-14048
 Project: Hive
  Issue Type: Bug
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar
Priority: Critical


The patch for HIVE-4570 removes protected fields like initialized, isDone, 
started etc and created a TaskState enum to represent these variables. Since 
these fields which were removed were marked protected, class which can extend 
Task.java (or DDLTask.java) will fail to compile once they get this patch.

The protected fields should probably be marked deprecated instead of removing 
them directly since it could break outside dependencies.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14047) add primary key on WRITE_SET

2016-06-17 Thread Thejas M Nair (JIRA)
Thejas M Nair created HIVE-14047:


 Summary: add primary key on WRITE_SET
 Key: HIVE-14047
 URL: https://issues.apache.org/jira/browse/HIVE-14047
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 1.3.0, 2.1.0
Reporter: Thejas M Nair


WRITE_SET table created in HIVE-13395 should some columns in the primary key. I 
expect most databases to organize the data in a b-tree with primary key as the 
index (or have an option to do so). That should help in reducing the search 
space for your prominent queries. As long as columns in the where clause match 
the prefix of the index, it should greatly reduce the search space.
You can add a autoincrement column to keep it unique if necessary. MySQL 
(innodb) anyway ends up organizing data on an autoincrement column, which is 
useless for the queries (see post ).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14046) Issue with more than 128 "distinct" expressions in Hive

2016-06-17 Thread Abhishek (JIRA)
Abhishek created HIVE-14046:
---

 Summary: Issue with more than 128 "distinct" expressions in Hive
 Key: HIVE-14046
 URL: https://issues.apache.org/jira/browse/HIVE-14046
 Project: Hive
  Issue Type: Bug
  Components: Hive
 Environment: HDP-2.3.4.0-3485
Hive1.2.1.2.3
HBase   1.1.1.2.3
Reporter: Abhishek


One of the users reported an Issue with more than 128 "distinct" expressions in 
Hive

Creating a simple HBase/Hive DB will also show a similar  issue.

On further checking we found a related issue which is open 

Looks like this is a known issue but has not been reliably reproduced.
https://issues.apache.org/jira/browse/HIVE-6998

Any idea or work arounds to address this issue or in which version of hive is 
this issue addressed




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 48233: HIVE-13884: Disallow queries fetching more than a configured number of partitions in PartitionPruner

2016-06-17 Thread Sergio Pena

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48233/
---

(Updated June 17, 2016, 3:18 p.m.)


Review request for hive, Mohit Sabharwal and Naveen Gangam.


Changes
---

Address changes mentioned by Mohit and Szehon.


Bugs: HIVE-13884
https://issues.apache.org/jira/browse/HIVE-13884


Repository: hive-git


Description
---

The patch verifies the # of partitions a table has before fetching any from the 
metastore. I
t checks that limit from 'hive.limit.query.max.table.partition'.

A limitation added here is that the variable must be on hive-site.xml in order 
to work, and it does not accept to set this through beeline because 
HiveMetaStore.java does not read the variables set through beeline. I think it 
is better to keep it this way to avoid users changing the value on fly, and 
crashing the metastore.

Another change is that EXPLAIN commands won't be executed either. EXPLAIN 
commands need to fetch partitions in order to create the operator tree. If we 
allow EXPLAIN to do that, then we may have the same OOM situations for large 
partitions.


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
cc950089cf52a0344e1be0c42309d521fb8cb4d6 
  metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
c0827ea9d47e569d9697649a7e16d196de3de14d 
  metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java 
c135179b97354108f842a5ca2de0c6f0ef28b7fc 
  metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 
da188d33d6194740ba9ecb37a6e533ecf1ec6906 
  metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 
a6d3f5385b33b8a4e31ee20ca5cb8f58c97c8702 
  metastore/src/java/org/apache/hadoop/hive/metastore/hbase/HBaseStore.java 
31f0d7b89670b8a749bbe8a7ff2b4ff9f059a8e2 
  
metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java
 3152e77c3c7152ac4dbe7e779ce35f28044fe3c9 
  
metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java
 86a243609b23e2ca9bb8849f0da863a95e477d5c 

Diff: https://reviews.apache.org/r/48233/diff/


Testing
---

Waiting for HiveQA.


Thanks,

Sergio Pena



Re: Review Request 48233: HIVE-13884: Disallow queries fetching more than a configured number of partitions in PartitionPruner

2016-06-17 Thread Sergio Pena


> On June 16, 2016, 9:24 p.m., Szehon Ho wrote:
> > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java, line 780
> > 
> >
> > Should we add this to 'metaVars' variable?  Reading the doc, it seems 
> > it will affect HiveCLI and allow those users to change it on the fly.
> 
> Sergio Pena wrote:
> So 'metaVars' is used to avoid users change it on the fly or to update 
> the metastore when they're changed on the fly? I did not understand the code 
> comment very well.
> 
> Szehon Ho wrote:
> I think it recreates it for case of embedded metastore.
> 
> It is just a suggestion, in case this is the behavior we want for this 
> flag.  Judging from other flags in this list, seems like it would fit.

Just did a test, and the user can change the flag on fly.


- Sergio


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48233/#review138089
---


On June 16, 2016, 4:04 p.m., Sergio Pena wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/48233/
> ---
> 
> (Updated June 16, 2016, 4:04 p.m.)
> 
> 
> Review request for hive, Mohit Sabharwal and Naveen Gangam.
> 
> 
> Bugs: HIVE-13884
> https://issues.apache.org/jira/browse/HIVE-13884
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> The patch verifies the # of partitions a table has before fetching any from 
> the metastore. I
> t checks that limit from 'hive.limit.query.max.table.partition'.
> 
> A limitation added here is that the variable must be on hive-site.xml in 
> order to work, and it does not accept to set this through beeline because 
> HiveMetaStore.java does not read the variables set through beeline. I think 
> it is better to keep it this way to avoid users changing the value on fly, 
> and crashing the metastore.
> 
> Another change is that EXPLAIN commands won't be executed either. EXPLAIN 
> commands need to fetch partitions in order to create the operator tree. If we 
> allow EXPLAIN to do that, then we may have the same OOM situations for large 
> partitions.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
> 761dbb279fb196e2bf1e0e59824827a4504eb136 
>   metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
> c0827ea9d47e569d9697649a7e16d196de3de14d 
>   metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java 
> c135179b97354108f842a5ca2de0c6f0ef28b7fc 
>   metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 
> da188d33d6194740ba9ecb37a6e533ecf1ec6906 
>   metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 
> a6d3f5385b33b8a4e31ee20ca5cb8f58c97c8702 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/HBaseStore.java 
> 31f0d7b89670b8a749bbe8a7ff2b4ff9f059a8e2 
>   
> metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java
>  3152e77c3c7152ac4dbe7e779ce35f28044fe3c9 
>   
> metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java
>  86a243609b23e2ca9bb8849f0da863a95e477d5c 
> 
> Diff: https://reviews.apache.org/r/48233/diff/
> 
> 
> Testing
> ---
> 
> Waiting for HiveQA.
> 
> 
> Thanks,
> 
> Sergio Pena
> 
>



[jira] [Created] (HIVE-14045) (Vectorization) Add missing case for BINARY in VectorizationContext.getNormalizedName method

2016-06-17 Thread Matt McCline (JIRA)
Matt McCline created HIVE-14045:
---

 Summary: (Vectorization) Add missing case for BINARY in 
VectorizationContext.getNormalizedName method
 Key: HIVE-14045
 URL: https://issues.apache.org/jira/browse/HIVE-14045
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Matt McCline
Assignee: Matt McCline
 Fix For: 2.2.0


Missing case for BINARY data type.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14044) Newlines in Avro maps cause external table to return corrupt values

2016-06-17 Thread David Nies (JIRA)
David Nies created HIVE-14044:
-

 Summary: Newlines in Avro maps cause external table to return 
corrupt values
 Key: HIVE-14044
 URL: https://issues.apache.org/jira/browse/HIVE-14044
 Project: Hive
  Issue Type: Bug
 Environment: Hive version: 1.1.0-cdh5.5.1 (bundled with cloudera 5.5.1)
Reporter: David Nies


When {{\n}} characters are contained in Avro files that are used as data bases 
for an external table, the result of {{SELECT}} queries may be corrupt. I 
encountered this error when querying hive both from {{beeline}} and from JDBC.

h3. Steps to reproduce (used files are attached to ticket)

# Create an {{.avro}} file that contains newline characters in a value of a map:
{code}
avro-tools fromjson --schema-file test.schema test.json > test.avro
{code}
# Copy {{.avro}} file to HDFS
{code}
hdfs dfs -copyFromLocal test.avro /some/location/
{code}
# Create an external table in beeline containing this {{.avro}}:
{code}
beeline> CREATE EXTERNAL TABLE broken_newline_map
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS
INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
LOCATION '/user/dnies/hive-test/broken-newline/db'
TBLPROPERTIES ('avro.schema.literal'='
{
  "type" : "record",
  "name" : "myEntry",
  "namespace" : "myNamespace",
  "fields" : [ {
"name" : "foo",
"type" : "long"
  }, {
"name" : "bar",
"type" : {
  "type" : "map",
  "values" : "string"
}
  } ]
}
');
{code}
# Now, selecting may return corrupt results:
{code}
jdbc:my-server:1/> select * from broken_newline_map;
+-+---+--+
| broken_newline_map.foo  |  broken_newline_map.bar   |
+-+---+--+
| 1   | {"key2":"value2","key1":"value1\nafter newline"}  |
| 2   | {"key2":"new value2","key1":"new value"}  |
+-+---+--+
2 rows selected (1.661 seconds)

jdbc:hive2://my-server:1/> select foo, map_keys(bar), map_values(bar) from 
broken_newline_map;
+---+--+-+--+
|  foo  |   _c1| _c2 |
+---+--+-+--+
| 1 | ["key2","key1"]  | ["value2","value1"] |
| NULL  | NULL | NULL|
| 2 | ["key2","key1"]  | ["new value2","new value"]  |
+---+--+-+--+
3 rows selected (28.05 seconds)
{code}

Obviously, the last result set contains corrupt entries (line 2). I also 
encountered this when doing this query with JDBC. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)