[jira] [Created] (HIVE-19313) TestJdbcWithDBTokenStoreNoDoAs tests are failing

2018-04-25 Thread Ashutosh Chauhan (JIRA)
Ashutosh Chauhan created HIVE-19313:
---

 Summary: TestJdbcWithDBTokenStoreNoDoAs tests are failing
 Key: HIVE-19313
 URL: https://issues.apache.org/jira/browse/HIVE-19313
 Project: Hive
  Issue Type: Sub-task
  Components: Test
Reporter: Ashutosh Chauhan
Assignee: Thejas M Nair


{code}
Stacktrace
java.sql.SQLException: Could not open client transport with JDBC Uri: 
jdbc:hive2://localhost:37606/default;principal=hive/localh...@example.com;: 
java.net.ConnectException: Connection refused
at org.apache.hive.jdbc.HiveConnection.(HiveConnection.java:252)
at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:107)
at java.sql.DriverManager.getConnection(DriverManager.java:664)
at java.sql.DriverManager.getConnection(DriverManager.java:270)
at 
org.apache.hive.minikdc.TestJdbcWithMiniKdc.testRenewDelegationToken(TestJdbcWithMiniKdc.java:180)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
{code}

Failing repeatedly in Hive QA builds.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: ptest queue

2018-04-25 Thread Deepak Jaiswal
+1 for option 3. Thanks Adam for taking this up again.

Regards,
Deepak

On 4/25/18, 4:54 PM, "Thejas Nair"  wrote:

Option 3 seems reasonable. I believe that used to be the state a while
back (maybe 12 months back or so).
When 2nd ptest for same jira runs, it checks if the latest patch has
already been run.


On Wed, Apr 25, 2018 at 7:37 AM, Peter Vary  wrote:
> I would vote for version 3. It would solve the big patch problem, and 
removes the unnecessary test runs too.
>
> Thanks,
> Peter
>
>> On Apr 25, 2018, at 11:01 AM, Adam Szita  wrote:
>>
>> Hi all,
>>
>> I had a patch (HIVE-19077) committed with the original aim being the
>> prevention of wasting resources when running ptest on the same patch
>> multiple times:
>> It is supposed to manage scenarios where a developer uploads
>> HIVE-XYZ.1.patch, that gets queued in jenkins, then before execution
>> HIVE-XYZ.2.patch (for the same jira) is uploaded and that gets queued 
also.
>> When the first patch starts to execute ptest will see that patch2 is the
>> latest patch and will use that. After some time the second queued job 
will
>> also run on this very same patch.
>> This is just pointless and causes long queues to progress slowly.
>>
>> My idea was to remove these duplicates from the queue where I'd only keep
>> the latest queued element if I see more queued entries for the same jira
>> number. It's like when you go grocery shopping and you're already in line
>> at cashier but you realise you also need e.g. milk. You go grab it and 
join
>> the END of the queue. So I believe it's a fair punishment for losing 
one's
>> spot in the queue for making amends on their patch.
>>
>> That said Deepak made me realise that for big patches this will be very
>> cumbersome due to the need of constant rebasing to avoid conflicts on 
patch
>> application.
>> I have three proposals now:
>>
>> 1: Leave this as it currently is (with HIVE-19077 committed) - *only the
>> latest queued job will run of the same jira*
>> pros: no wasting resources to run the same patches more times, 
'scheduling'
>> is fair: if you amend you're patch you may loose your original spot in 
the
>> queue
>> cons: big patches that are prone to conflicts will be hard to get 
executed
>> in ptest, devs will have to wait more time for their ptest results if 
they
>> amend their patches
>>
>> 2: *Add a safety switch* to this queue checking feature (currently 
proposed
>> in HIVE-19077), deduplication can be switch off on request
>> pros: same as 1st, + ability to have more control on this mechanism i.e.
>> turn it off for big/urgent patches
>> cons: big patches that use the swich might still waste resources, also 
devs
>> might use safety switch inappropriately for their own evil benefit :)
>>
>> 3: Deduplication the other way around - *only the first queued job will 
run
>> of the same jira*, ptest server will keep record of patch names and won't
>> execute a patch with a seen name and jira number again
>> pros: same patches will not be executed more times accidentally, big
>> patches won't be a problem either, devs will get their ptest result back
>> earlier even if more jobs are triggered for same jira/patch name
>> cons: scheduling is less fair: devs can reserve their spots in the queue
>>
>>
>> (0: restore original: I'm strongly against this, ptest queue is already 
too
>> big as it is, we have to at least try and decrease its size by
>> deduplicating jiras in it)
>>
>> I'm personally fine with any of the 1,2,3 methods listed above, with my
>> favourites being 2 and 3.
>> Let me know which one you think is the right path to go down on.
>>
>> Thanks,
>> Adam
>>
>> On 20 April 2018 at 20:14, Eugene Koifman  
wrote:
>>
>>> Would it be possible to add patch name validation when it gets added to
>>> the queue?
>>> Currently I think it fails when the bot gets to the patch if it’s not
>>> named correctly.
>>> More  common for branch patches
>>>
>>> On 4/20/18, 8:20 AM, "Zoltan Haindrich"  wrote:
>>>
>>>Hello,
>>>
>>>Some time ago the ptest queue worked the following way:
>>>
>>>* for some reason ATTACHMENT_ID was not set by the upstream jira
>>> scanner
>>>tool; this  triggered a feature in Jenkins: if for the same ticket
>>>mutliple patches were uploaded; they didn't triggered new runs
>>> (because
>>>the parameters were the same)
>>>* this have become fixed at some point...around that time I started
>>>getting multiple ptest executions for the same ticket - because I've
>>>

Re: Review Request 66645: HIVE-19211: New streaming ingest API and support for dynamic partitioning

2018-04-25 Thread Eugene Koifman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66645/#review201965
---




itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
Lines 426 (patched)


This seems to switch the test from 1 api to another.  Could it be 
parametrized so the old one doesn't loose coverage?



itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
Lines 753 (patched)


parametrize test instead?



itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
Lines 823 (patched)


ame as above



serde/src/java/org/apache/hadoop/hive/serde2/JsonSerDe.java
Lines 148 (patched)


obsolte comment



streaming/src/java/org/apache/hive/streaming/AbstractRecordWriter.java
Line 82 (original), 95 (patched)


should this throw if this.conn != null to make sure Writers are not shared



streaming/src/java/org/apache/hive/streaming/AbstractRecordWriter.java
Lines 113 (patched)


Is there a plan to support MM tables?  (with batch size of 1 it can be 
done).  They won't implement AcidOutputFormat



streaming/src/java/org/apache/hive/streaming/AbstractRecordWriter.java
Lines 399 (patched)


presumably it also already exist for dynamic parts?



streaming/src/java/org/apache/hive/streaming/AbstractRecordWriter.java
Lines 402 (patched)


the log msg seems a bit misleading since createPartitionIfNotExists() may 
or may not create anything.  Perhaps createPartitionIfNotExists() should return 
status if it actually crated something or do the logging in that method.



streaming/src/java/org/apache/hive/streaming/HiveStreamingConnection.java
Lines 76 (patched)


It should explain the threading model somewhere.  I assume everything is 
meant to be single threaded, though it's common for client code to process 
cancel type of event in a separate thread.  here it would presumably map to 
close() (perhaps abortTransaction())



streaming/src/java/org/apache/hive/streaming/HiveStreamingConnection.java
Lines 79 (patched)


"bind" rather than "build"?



streaming/src/java/org/apache/hive/streaming/HiveStreamingConnection.java
Lines 283 (patched)


Not sure we want to expose it in such an obvious way.
It may be better if there is a table prop for this or perhaps the doc 
should say that this is 'advisory' and we may ignore the value.  Longer term 
we'd like to consider an option where we force batch size = 1.  This is the 
only option for some cloud stores and generally simplifies acid if we don't 
have deltas_x_y with x<>y which may contain aborted txns.



streaming/src/java/org/apache/hive/streaming/HiveStreamingConnection.java
Lines 377 (patched)


why does this need Alter Table?  You already have a metastore connection 
and you know that you are adding a partiton w/o data.  In other words, why not 
just msClient.add_partition() and handle AlreadyExistsException or 
add_partitions(
  List partitions, boolean ifNotExists, boolean needResults).
 
This seems simpler and more efficient.



streaming/src/java/org/apache/hive/streaming/HiveStreamingConnection.java
Lines 399 (patched)


use Warehouse.makePartName()?



streaming/src/java/org/apache/hive/streaming/HiveStreamingConnection.java
Lines 996 (patched)


"we don'table wait"



streaming/src/java/org/apache/hive/streaming/HiveStreamingConnection.java
Lines 1050 (patched)


is this for testing?  How can the user control these?



streaming/src/java/org/apache/hive/streaming/RecordWriter.java
Lines 53 (patched)


maybe make this onNewBatch() since it's not starting a batch



streaming/src/java/org/apache/hive/streaming/RecordWriter.java
Line 42 (original), 60 (patched)


close() or closeWriter()?



streaming/src/test/org/apache/hive/streaming/TestStreamingDynamicPartitioning.java
Lines 261 (patched)


why does it need bucketid?


- Eugene Koifman


On 

Re: Review Request 66567: Migrate to Murmur hash for shuffle and bucketing

2018-04-25 Thread Deepak Jaiswal


> On April 26, 2018, 1:11 a.m., Jason Dere wrote:
> > hcatalog/webhcat/java-client/src/main/java/org/apache/hive/hcatalog/api/HCatTable.java
> > Lines 179 (patched)
> > 
> >
> > Check the existing table params for bucketing_version before 
> > hard-coding to v2.

Will do that.


> On April 26, 2018, 1:11 a.m., Jason Dere wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/vector/reducesink/VectorReduceSinkCommonOperator.java
> > Lines 143 (patched)
> > 
> >
> > This derives from Operator? So it should already have the 
> > bucketingVersion field from that?

Good point. Let me verify this and work on it.


> On April 26, 2018, 1:11 a.m., Jason Dere wrote:
> > ql/src/test/org/apache/hadoop/hive/ql/metadata/TestHive.java
> > Line 339 (original), 339 (patched)
> > 
> >
> > I think this change is no longer necessary.

I will verify and update accordingly.


> On April 26, 2018, 1:11 a.m., Jason Dere wrote:
> > standalone-metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/hive_metastoreConstants.java
> > Lines 89 (patched)
> > 
> >
> > Is this no longer used?

I might have missed the place where it is used in original patch. It is beyond 
scope of this patch. Will track it with HIVE-19311 which makes this redundant 
anyway.


- Deepak


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66567/#review201975
---


On April 25, 2018, 7:21 a.m., Deepak Jaiswal wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66567/
> ---
> 
> (Updated April 25, 2018, 7:21 a.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan, Eugene Koifman, Jason Dere, and 
> Matt McCline.
> 
> 
> Bugs: HIVE-18910
> https://issues.apache.org/jira/browse/HIVE-18910
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Hive uses JAVA hash which is not as good as murmur for better distribution 
> and efficiency in bucketing a table.
> Migrate to murmur hash but still keep backward compatibility for existing 
> users so that they dont have to reload the existing tables.
> 
> To keep backward compatibility, bucket_version is added as a table property, 
> resulting in high number of result updates.
> 
> 
> Diffs
> -
> 
>   hbase-handler/src/test/results/positive/external_table_ppd.q.out cdc43ee560 
>   hbase-handler/src/test/results/positive/hbase_binary_storage_queries.q.out 
> 153613e6d0 
>   hbase-handler/src/test/results/positive/hbase_ddl.q.out ef3f5f704e 
>   hbase-handler/src/test/results/positive/hbasestats.q.out 5d000d2f4f 
>   
> hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/AbstractRecordWriter.java
>  924e233293 
>   
> hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/TestStreaming.java
>  fe2b1c1f3c 
>   
> hcatalog/webhcat/java-client/src/main/java/org/apache/hive/hcatalog/api/HCatTable.java
>  996329195c 
>   
> hcatalog/webhcat/java-client/src/test/java/org/apache/hive/hcatalog/api/TestHCatClient.java
>  f9ee9d9a03 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_into_dynamic_partitions.q.out
>  caa00292b8 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_into_table.q.out 
> ab8ad77074 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_directory.q.out
>  2b28a6677e 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_dynamic_partitions.q.out
>  cdb67dd786 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_table.q.out
>  2c23a7e94f 
>   
> itests/hive-blobstore/src/test/results/clientpositive/write_final_output_blobstore.q.out
>  a1be085ea5 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
>  82ba775286 
>   itests/src/test/resources/testconfiguration.properties 2c1a76d89b 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java c084fa054c 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java d59bf1fb6e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java c28ef99621 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 21ca04d78a 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java 
> d4363fdf91 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 6395c31ec7 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/keyseries/VectorKeySeriesSerializedImpl.java
>  86f466fc4e 
>   
> 

Re: Review Request 66567: Migrate to Murmur hash for shuffle and bucketing

2018-04-25 Thread Jason Dere

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66567/#review201975
---




hcatalog/webhcat/java-client/src/main/java/org/apache/hive/hcatalog/api/HCatTable.java
Lines 179 (patched)


Check the existing table params for bucketing_version before hard-coding to 
v2.



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/reducesink/VectorReduceSinkCommonOperator.java
Lines 143 (patched)


This derives from Operator? So it should already have the bucketingVersion 
field from that?



ql/src/test/org/apache/hadoop/hive/ql/metadata/TestHive.java
Line 339 (original), 339 (patched)


I think this change is no longer necessary.



standalone-metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/hive_metastoreConstants.java
Lines 89 (patched)


Is this no longer used?


- Jason Dere


On April 25, 2018, 7:21 a.m., Deepak Jaiswal wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66567/
> ---
> 
> (Updated April 25, 2018, 7:21 a.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan, Eugene Koifman, Jason Dere, and 
> Matt McCline.
> 
> 
> Bugs: HIVE-18910
> https://issues.apache.org/jira/browse/HIVE-18910
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Hive uses JAVA hash which is not as good as murmur for better distribution 
> and efficiency in bucketing a table.
> Migrate to murmur hash but still keep backward compatibility for existing 
> users so that they dont have to reload the existing tables.
> 
> To keep backward compatibility, bucket_version is added as a table property, 
> resulting in high number of result updates.
> 
> 
> Diffs
> -
> 
>   hbase-handler/src/test/results/positive/external_table_ppd.q.out cdc43ee560 
>   hbase-handler/src/test/results/positive/hbase_binary_storage_queries.q.out 
> 153613e6d0 
>   hbase-handler/src/test/results/positive/hbase_ddl.q.out ef3f5f704e 
>   hbase-handler/src/test/results/positive/hbasestats.q.out 5d000d2f4f 
>   
> hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/AbstractRecordWriter.java
>  924e233293 
>   
> hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/TestStreaming.java
>  fe2b1c1f3c 
>   
> hcatalog/webhcat/java-client/src/main/java/org/apache/hive/hcatalog/api/HCatTable.java
>  996329195c 
>   
> hcatalog/webhcat/java-client/src/test/java/org/apache/hive/hcatalog/api/TestHCatClient.java
>  f9ee9d9a03 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_into_dynamic_partitions.q.out
>  caa00292b8 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_into_table.q.out 
> ab8ad77074 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_directory.q.out
>  2b28a6677e 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_dynamic_partitions.q.out
>  cdb67dd786 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_table.q.out
>  2c23a7e94f 
>   
> itests/hive-blobstore/src/test/results/clientpositive/write_final_output_blobstore.q.out
>  a1be085ea5 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
>  82ba775286 
>   itests/src/test/resources/testconfiguration.properties 2c1a76d89b 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java c084fa054c 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java d59bf1fb6e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java c28ef99621 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 21ca04d78a 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java 
> d4363fdf91 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 6395c31ec7 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/keyseries/VectorKeySeriesSerializedImpl.java
>  86f466fc4e 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/reducesink/VectorReduceSinkCommonOperator.java
>  4077552a56 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/reducesink/VectorReduceSinkObjectHashOperator.java
>  1bc3fdabac 
>   ql/src/java/org/apache/hadoop/hive/ql/io/HiveFileFormatUtils.java 
> 71498a125c 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java dc6cc62fbb 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java a51fdd322f 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java 
> 7121bceb22 
>   
> 

Re: Review Request 66808: HIVE-19312 MM tables don't work with BucketizedHIF

2018-04-25 Thread Sergey Shelukhin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66808/
---

(Updated April 26, 2018, 1:11 a.m.)


Review request for hive, Eugene Koifman and Seong (Steve) Yeom.


Repository: hive-git


Description
---

see jira


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/io/BucketizedHiveInputFormat.java 
e09c6ecac0 
  ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 611a4c346b 
  ql/src/test/queries/clientpositive/mm_bhif.q PRE-CREATION 
  ql/src/test/results/clientpositive/mm_bhif.q.out PRE-CREATION 


Diff: https://reviews.apache.org/r/66808/diff/2/

Changes: https://reviews.apache.org/r/66808/diff/1-2/


Testing
---


Thanks,

Sergey Shelukhin



Review Request 66808: HIVE-19312 MM tables don't work with BucketizedHIF

2018-04-25 Thread Sergey Shelukhin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66808/
---

Review request for hive, Eugene Koifman and Seong (Steve) Yeom.


Repository: hive-git


Description
---

see jira


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/AbstractFileMergeOperator.java 
df8441727d 
  ql/src/java/org/apache/hadoop/hive/ql/exec/CopyTask.java b0ec5abcce 
  ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 15e6c34fa5 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java 969c591917 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java c084fa054c 
  ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java dbda5fdef4 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ReplTxnTask.java 2615072f5e 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 25035433c7 
  ql/src/java/org/apache/hadoop/hive/ql/io/BucketizedHiveInputFormat.java 
e09c6ecac0 
  ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 611a4c346b 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 4661881301 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 
605bb09cab 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/unionproc/UnionProcFactory.java 
f753a903b3 
  ql/src/java/org/apache/hadoop/hive/ql/parse/ImportSemanticAnalyzer.java 
b850ddc9d0 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1dccf969ff 
  ql/src/java/org/apache/hadoop/hive/ql/parse/TaskCompiler.java c268ddc879 
  ql/src/java/org/apache/hadoop/hive/ql/plan/LoadFileDesc.java 46761ffaec 
  ql/src/java/org/apache/hadoop/hive/ql/plan/LoadTableDesc.java f15b3c3879 
  ql/src/java/org/apache/hadoop/hive/ql/plan/MoveWork.java 9a1e3a1af5 
  ql/src/java/org/apache/hadoop/hive/ql/stats/fs/FSStatsAggregator.java 
6d2de0a3ae 
  ql/src/java/org/apache/hadoop/hive/ql/stats/fs/FSStatsPublisher.java 
902b37f787 
  ql/src/test/queries/clientpositive/mm_bhif.q PRE-CREATION 
  ql/src/test/results/clientpositive/mm_bhif.q.out PRE-CREATION 


Diff: https://reviews.apache.org/r/66808/diff/1/


Testing
---


Thanks,

Sergey Shelukhin



[jira] [Created] (HIVE-19312) MM tables don't work with BucketizedHIF

2018-04-25 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-19312:
---

 Summary: MM tables don't work with BucketizedHIF
 Key: HIVE-19312
 URL: https://issues.apache.org/jira/browse/HIVE-19312
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-19312.patch





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: ptest queue

2018-04-25 Thread Thejas Nair
Option 3 seems reasonable. I believe that used to be the state a while
back (maybe 12 months back or so).
When 2nd ptest for same jira runs, it checks if the latest patch has
already been run.


On Wed, Apr 25, 2018 at 7:37 AM, Peter Vary  wrote:
> I would vote for version 3. It would solve the big patch problem, and removes 
> the unnecessary test runs too.
>
> Thanks,
> Peter
>
>> On Apr 25, 2018, at 11:01 AM, Adam Szita  wrote:
>>
>> Hi all,
>>
>> I had a patch (HIVE-19077) committed with the original aim being the
>> prevention of wasting resources when running ptest on the same patch
>> multiple times:
>> It is supposed to manage scenarios where a developer uploads
>> HIVE-XYZ.1.patch, that gets queued in jenkins, then before execution
>> HIVE-XYZ.2.patch (for the same jira) is uploaded and that gets queued also.
>> When the first patch starts to execute ptest will see that patch2 is the
>> latest patch and will use that. After some time the second queued job will
>> also run on this very same patch.
>> This is just pointless and causes long queues to progress slowly.
>>
>> My idea was to remove these duplicates from the queue where I'd only keep
>> the latest queued element if I see more queued entries for the same jira
>> number. It's like when you go grocery shopping and you're already in line
>> at cashier but you realise you also need e.g. milk. You go grab it and join
>> the END of the queue. So I believe it's a fair punishment for losing one's
>> spot in the queue for making amends on their patch.
>>
>> That said Deepak made me realise that for big patches this will be very
>> cumbersome due to the need of constant rebasing to avoid conflicts on patch
>> application.
>> I have three proposals now:
>>
>> 1: Leave this as it currently is (with HIVE-19077 committed) - *only the
>> latest queued job will run of the same jira*
>> pros: no wasting resources to run the same patches more times, 'scheduling'
>> is fair: if you amend you're patch you may loose your original spot in the
>> queue
>> cons: big patches that are prone to conflicts will be hard to get executed
>> in ptest, devs will have to wait more time for their ptest results if they
>> amend their patches
>>
>> 2: *Add a safety switch* to this queue checking feature (currently proposed
>> in HIVE-19077), deduplication can be switch off on request
>> pros: same as 1st, + ability to have more control on this mechanism i.e.
>> turn it off for big/urgent patches
>> cons: big patches that use the swich might still waste resources, also devs
>> might use safety switch inappropriately for their own evil benefit :)
>>
>> 3: Deduplication the other way around - *only the first queued job will run
>> of the same jira*, ptest server will keep record of patch names and won't
>> execute a patch with a seen name and jira number again
>> pros: same patches will not be executed more times accidentally, big
>> patches won't be a problem either, devs will get their ptest result back
>> earlier even if more jobs are triggered for same jira/patch name
>> cons: scheduling is less fair: devs can reserve their spots in the queue
>>
>>
>> (0: restore original: I'm strongly against this, ptest queue is already too
>> big as it is, we have to at least try and decrease its size by
>> deduplicating jiras in it)
>>
>> I'm personally fine with any of the 1,2,3 methods listed above, with my
>> favourites being 2 and 3.
>> Let me know which one you think is the right path to go down on.
>>
>> Thanks,
>> Adam
>>
>> On 20 April 2018 at 20:14, Eugene Koifman  wrote:
>>
>>> Would it be possible to add patch name validation when it gets added to
>>> the queue?
>>> Currently I think it fails when the bot gets to the patch if it’s not
>>> named correctly.
>>> More  common for branch patches
>>>
>>> On 4/20/18, 8:20 AM, "Zoltan Haindrich"  wrote:
>>>
>>>Hello,
>>>
>>>Some time ago the ptest queue worked the following way:
>>>
>>>* for some reason ATTACHMENT_ID was not set by the upstream jira
>>> scanner
>>>tool; this  triggered a feature in Jenkins: if for the same ticket
>>>mutliple patches were uploaded; they didn't triggered new runs
>>> (because
>>>the parameters were the same)
>>>* this have become fixed at some point...around that time I started
>>>getting multiple ptest executions for the same ticket - because I've
>>>fixed a minor typo after submitting the first version of my patch...
>>>* currently we also have a jenkins queue reader inside the ptest
>>>job...which checks if the ticket is in the queue right now; and if is
>>>it, it just exits...this logic kinda restores the earlier behaviour;
>>>with the exception that if I upload a patch every day and the queue is
>>>longer that 1day (like now); I will never get a ptest run :D
>>>* ...now here I come! I've just removed my patch from yesterday;
>>> because
>>>I want a ptest run 

Review Request 66805: HIVE-19311 : Partition and bucketing support for “load data” statement

2018-04-25 Thread Deepak Jaiswal

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66805/
---

Review request for hive, Ashutosh Chauhan, Eugene Koifman, Jesús Camacho 
Rodríguez, and Vineet Garg.


Bugs: HIVE-19311
https://issues.apache.org/jira/browse/HIVE-19311


Repository: hive-git


Description
---

Currently, "load data" statement is very limited. It errors out if any of the 
information is missing such as partitioning info if table is partitioned or 
appropriate names when table is bucketed.
It should be able to launch an insert job to load the data instead.


Diffs (updated)
-

  data/files/load_data_job/bucketing.txt PRE-CREATION 
  data/files/load_data_job/partitions/load_data_1_partition.txt PRE-CREATION 
  data/files/load_data_job/partitions/load_data_2_partitions.txt PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/Context.java 70846ac3ce 
  ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 7d33fa3892 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java a51fdd322f 
  ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java 
c07991d434 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1dccf969ff 
  ql/src/java/org/apache/hadoop/hive/ql/parse/UpdateDeleteSemanticAnalyzer.java 
2f3b07f4af 
  ql/src/test/queries/clientpositive/load_data_using_job.q PRE-CREATION 
  ql/src/test/results/clientpositive/llap/load_data_using_job.q.out 
PRE-CREATION 


Diff: https://reviews.apache.org/r/66805/diff/1/


Testing
---

Added a unit test.


Thanks,

Deepak Jaiswal



[jira] [Created] (HIVE-19311) Partition and bucketing support for “load data” statement

2018-04-25 Thread Deepak Jaiswal (JIRA)
Deepak Jaiswal created HIVE-19311:
-

 Summary: Partition and bucketing support for “load data” statement
 Key: HIVE-19311
 URL: https://issues.apache.org/jira/browse/HIVE-19311
 Project: Hive
  Issue Type: Bug
Reporter: Deepak Jaiswal
Assignee: Deepak Jaiswal


Currently, "load data" statement is very limited. It errors out if any of the 
information is missing such as partitioning info if table is partitioned or 
appropriate names when table is bucketed.

It should be able to launch an insert job to load the data instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19310) Metastore: MetaStoreDirectSql.ensureDbInit has some slow DN calls which might need to be run only in test env

2018-04-25 Thread Vaibhav Gumashta (JIRA)
Vaibhav Gumashta created HIVE-19310:
---

 Summary: Metastore: MetaStoreDirectSql.ensureDbInit has some slow 
DN calls which might need to be run only in test env
 Key: HIVE-19310
 URL: https://issues.apache.org/jira/browse/HIVE-19310
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 3.0.0, 3.1.0
Reporter: Vaibhav Gumashta


MetaStoreDirectSql.ensureDbInit has the following 2 calls which we have 
observed taking a long time in our testing:
{code}
initQueries.add(pm.newQuery(MNotificationLog.class, "dbName == ''"));
initQueries.add(pm.newQuery(MNotificationNextId.class, "nextEventId < -1"));
{code}
In a production environment, these tables should be initialized using 
schematool, however in a test environment, these calls might be needed. 




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 66290: HIVE-14388 : Add number of rows inserted message after insert command in Beeline

2018-04-25 Thread Bharathkrishna Guruvayoor Murali via Review Board


> On March 27, 2018, 12:53 p.m., Peter Vary wrote:
> > jdbc/src/java/org/apache/hive/jdbc/HiveStatement.java
> > Line 712 (original), 712 (patched)
> > 
> >
> > Is it possible to behave differently, when we have information about 
> > the number of rows, and when we do not know anything? The returned number 
> > will be 0 in this case, which might cause interesting behavior I guess :)
> 
> Bharathkrishna Guruvayoor Murali wrote:
> This is called only when resultSet is not present (like in insert 
> queries). If it returns zero, it will display No rows affected, which is like 
> the current behavior.
> 
> Peter Vary wrote:
> So the code handles -1, and 0 in the same way?
> Previously we returned -1 indicating we do not have info about the 
> affected rows. No we always will return 0 if we do not have the exact info, 
> like when running HoS
> 
> Bharathkrishna Guruvayoor Murali wrote:
> I have changed the default value in QueryState for numModifiedRows to -1 
> from 0.
> Currently it shows the same message on beeline for both -1 and 0, but I 
> guess it is good to have default as -1 in case we have any use-case that 
> needs to distinguish these two cases.

Please ignore my previous message, I think my change of having default value of 
-1 for numModifiedRows in QueryState won't have an effect, because it will be 
overwritten as 0.
I will keep the default value of numModifiedRows as 0 itself. 
I do not think it is a problem because even the present behavior when it 
returns -1, it shows no rows affected, hence I think current behavior won't be 
affected.


- Bharathkrishna


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66290/#review200042
---


On April 23, 2018, 9:56 p.m., Bharathkrishna Guruvayoor Murali wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66290/
> ---
> 
> (Updated April 23, 2018, 9:56 p.m.)
> 
> 
> Review request for hive, Sahil Takiar and Vihang Karajgaonkar.
> 
> 
> Bugs: HIVE-14388
> https://issues.apache.org/jira/browse/HIVE-14388
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Currently, when you run insert command on beeline, it returns a message 
> saying "No rows affected .."
> A better and more intuitive msg would be "xxx rows inserted (26.068 seconds)"
> 
> Added the numRows parameter as part of QueryState.
> Adding the numRows to the response as well to display in beeline.
> 
> Getting the count in FileSinkOperator and setting it in statsMap, when it 
> operates only on table specific rows for the particular operation. (so that 
> we can get only the insert to table count and avoid counting non-table 
> specific file-sink operations happening during query execution).
> 
> 
> Diffs
> -
> 
>   jdbc/src/java/org/apache/hive/jdbc/HiveStatement.java 
> 06542cee02e5dc4696f2621bb45cc4f24c67dfda 
>   ql/src/java/org/apache/hadoop/hive/ql/Driver.java 
> 9cb2ff101581d22965b447e82601970d909daefd 
>   ql/src/java/org/apache/hadoop/hive/ql/MapRedStats.java 
> cf9c2273159c0d779ea90ad029613678fb0967a6 
>   ql/src/java/org/apache/hadoop/hive/ql/QueryState.java 
> 706c9ffa48b9c3b4a6fdaae78bab1d39c3d0efda 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java 
> c084fa054cb771bfdb033d244935713e3c7eb874 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/HadoopJobExecHelper.java 
> fcdc9967f12a454a9d3f31031e2261f264479118 
>   service-rpc/if/TCLIService.thrift 30f8af7f3e6e0598b410498782900ac27971aef0 
>   service-rpc/src/gen/thrift/gen-cpp/TCLIService_types.h 
> 4321ad6d3c966d30f7a69552f91804cf2f1ba6c4 
>   service-rpc/src/gen/thrift/gen-cpp/TCLIService_types.cpp 
> b2b62c71492b844f4439367364c5c81aa62f3908 
>   
> service-rpc/src/gen/thrift/gen-javabean/org/apache/hive/service/rpc/thrift/TGetOperationStatusResp.java
>  15e8220eb3eb12b72c7b64029410dced33bc0d72 
>   service-rpc/src/gen/thrift/gen-php/Types.php 
> abb7c1ff3a2c8b72dc97689758266b675880e32b 
>   service-rpc/src/gen/thrift/gen-py/TCLIService/ttypes.py 
> 0f8fd0745be0f4ed9e96b7bbe0f092d03649bcdf 
>   service-rpc/src/gen/thrift/gen-rb/t_c_l_i_service_types.rb 
> 60183dae9e9927bd09a9676e49eeb4aea2401737 
>   service/src/java/org/apache/hive/service/cli/CLIService.java 
> c9914ba9bf8653cbcbca7d6612e98a64058c0fcc 
>   service/src/java/org/apache/hive/service/cli/OperationStatus.java 
> 52cc3ae4f26b990b3e4edb52d9de85b3cc25f269 
>   service/src/java/org/apache/hive/service/cli/operation/Operation.java 
> 3706c72abc77ac8bd77947cc1c5d084ddf965e9f 
>   service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java 
> c64c99120ad21ee98af81ec6659a2722e3e1d1c7 
> 
> 
> Diff: 

Re: Review Request 66788: HIVE-19282 don't nest delta directories inside LB directories for ACID tables

2018-04-25 Thread Sergey Shelukhin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66788/
---

(Updated April 25, 2018, 6:41 p.m.)


Review request for hive, Prasanth_J and Seong (Steve) Yeom.


Repository: hive-git


Description
---

see jira


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java c084fa054c 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 6395c31ec7 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 4661881301 
  ql/src/test/queries/clientpositive/mm_all.q 4ffbb6b98a 


Diff: https://reviews.apache.org/r/66788/diff/2/

Changes: https://reviews.apache.org/r/66788/diff/1-2/


Testing
---


Thanks,

Sergey Shelukhin



[jira] [Created] (HIVE-19309) Add Arrow dependencies to LlapServiceDriver

2018-04-25 Thread Eric Wohlstadter (JIRA)
Eric Wohlstadter created HIVE-19309:
---

 Summary: Add Arrow dependencies to LlapServiceDriver
 Key: HIVE-19309
 URL: https://issues.apache.org/jira/browse/HIVE-19309
 Project: Hive
  Issue Type: Task
  Components: llap
Reporter: Eric Wohlstadter


Need to make arrow jars available to daemons.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19308) Provide an Arrow stream reader for external LLAP clients

2018-04-25 Thread Eric Wohlstadter (JIRA)
Eric Wohlstadter created HIVE-19308:
---

 Summary: Provide an Arrow stream reader for external LLAP clients 
 Key: HIVE-19308
 URL: https://issues.apache.org/jira/browse/HIVE-19308
 Project: Hive
  Issue Type: Task
  Components: llap
Reporter: Eric Wohlstadter
Assignee: Eric Wohlstadter


This is a sub-class of LlapBaseRecordReader that wraps the socket inputStream 
and produces Arrow batches for an external client.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19307) Support ArrowOutputStream in LlapOutputFormatService

2018-04-25 Thread Eric Wohlstadter (JIRA)
Eric Wohlstadter created HIVE-19307:
---

 Summary: Support ArrowOutputStream in LlapOutputFormatService
 Key: HIVE-19307
 URL: https://issues.apache.org/jira/browse/HIVE-19307
 Project: Hive
  Issue Type: Task
  Components: llap
Reporter: Eric Wohlstadter
Assignee: Eric Wohlstadter


Support pushing arrow batches through 
org.apache.arrow.vector.ipc.ArrowOutputStream in LllapOutputFormatService.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19306) Arrow batch serializer

2018-04-25 Thread Eric Wohlstadter (JIRA)
Eric Wohlstadter created HIVE-19306:
---

 Summary: Arrow batch serializer
 Key: HIVE-19306
 URL: https://issues.apache.org/jira/browse/HIVE-19306
 Project: Hive
  Issue Type: Task
  Components: Serializers/Deserializers
Reporter: Eric Wohlstadter
Assignee: Teddy Choi


Leverage the ThriftJDBCBinarySerDe code path that already exists in 
SematicAnalyzer/FileSinkOperator to create a serializer that batches rows into 
Arrow vector batches.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19305) Arrow format for LlapOutputFormatService (umbrella)

2018-04-25 Thread Eric Wohlstadter (JIRA)
Eric Wohlstadter created HIVE-19305:
---

 Summary: Arrow format for LlapOutputFormatService (umbrella)
 Key: HIVE-19305
 URL: https://issues.apache.org/jira/browse/HIVE-19305
 Project: Hive
  Issue Type: Improvement
  Components: llap
Reporter: Eric Wohlstadter
Assignee: Eric Wohlstadter


Allows external clients to consume output from LLAP daemons in Arrow stream 
format.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19304) Update templates.py based on config changes in YARN-7142 and YARN-8122

2018-04-25 Thread Gour Saha (JIRA)
Gour Saha created HIVE-19304:


 Summary: Update templates.py based on config changes in YARN-7142 
and YARN-8122
 Key: HIVE-19304
 URL: https://issues.apache.org/jira/browse/HIVE-19304
 Project: Hive
  Issue Type: Sub-task
Reporter: Gour Saha


Now that YARN-7142 is committed and YARN-8122 will be committed soon, we need 
to update templates.py based on config changes for placement policy and health 
threshold monitor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Review Request 66800: HIVE-6980 Drop table by using direct sql

2018-04-25 Thread Peter Vary via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66800/
---

Review request for hive, Alexander Kolbasov, Alan Gates, Marta Kuczora, Adam 
Szita, and Vihang Karajgaonkar.


Bugs: HIVE-6980
https://issues.apache.org/jira/browse/HIVE-6980


Repository: hive-git


Description
---

First version of the patch.

Splits getPartitionsViaSqlFilterInternal to:

getPartitionIdsViaSqlFilter - which returns the partition ids
getPartitionsFromPartitionIds - which returns the partition data for the 
partitions
Creates dropPartitionsByPartitionIds which drops the partitions by directSQL 
commands

Creates a dropPartitionsViaSqlFilter using getPartitionIdsViaSqlFilter and 
dropPartitionsByPartitionIds.

Modifies the ObjectStore to drop partitions with directsql if possible.


Diffs
-

  
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
 997f5fd 
  
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
 184ecb6 


Diff: https://reviews.apache.org/r/66800/diff/1/


Testing
---

Run the TestDropPartition tests, also checked the database manually, that no 
object left in the database


Thanks,

Peter Vary



[jira] [Created] (HIVE-19303) Fix grammar warnings

2018-04-25 Thread Zoltan Haindrich (JIRA)
Zoltan Haindrich created HIVE-19303:
---

 Summary: Fix grammar warnings
 Key: HIVE-19303
 URL: https://issues.apache.org/jira/browse/HIVE-19303
 Project: Hive
  Issue Type: Bug
Reporter: Zoltan Haindrich


It seems to be that something is not right around the handling of "KW_CHECK"

https://github.com/apache/hive/blob/da10aabe56edf8fbb26d89d64bedcc4afa84a305/ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g#L2376

{code}
warning(200): org/apache/hadoop/hive/ql/parse/HiveParser.g:2376:5: 
Decision can match input such as "KW_CHECK {KW_EXISTS, KW_TINYINT}" using 
multiple alternatives: 1, 2

As a result, alternative(s) 2 were disabled for that input
warning(200): org/apache/hadoop/hive/ql/parse/HiveParser.g:2376:5: 
Decision can match input such as "KW_CHECK KW_STRUCT LESSTHAN" using multiple 
alternatives: 1, 2

As a result, alternative(s) 2 were disabled for that input
warning(200): org/apache/hadoop/hive/ql/parse/HiveParser.g:2376:5: 
Decision can match input such as "KW_CHECK KW_DATETIME" using multiple 
alternatives: 1, 2

As a result, alternative(s) 2 were disabled for that input
warning(200): org/apache/hadoop/hive/ql/parse/HiveParser.g:2376:5: 
Decision can match input such as "KW_CHECK KW_DATE {LPAREN, StringLiteral}" 
using multiple alternatives: 1, 2

As a result, alternative(s) 2 were disabled for that input
warning(200): org/apache/hadoop/hive/ql/parse/HiveParser.g:2376:5: 
Decision can match input such as "KW_CHECK KW_UNIONTYPE LESSTHAN" using 
multiple alternatives: 1, 2

{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19302) Logging Too Verbose For TableNotFound

2018-04-25 Thread BELUGA BEHR (JIRA)
BELUGA BEHR created HIVE-19302:
--

 Summary: Logging Too Verbose For TableNotFound
 Key: HIVE-19302
 URL: https://issues.apache.org/jira/browse/HIVE-19302
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Affects Versions: 2.2.0, 3.0.0
Reporter: BELUGA BEHR
 Attachments: table_not_found_cdh6.txt

There is way too much logging when a user submits a query against a table which 
does not exist.  In an ad-hoc setting, it is quite normal that a user 
fat-fingers a table name.  Yet, from the perspective of the Hive administrator, 
there was perhaps a major issue based on the volume and severity of logging.  
Please change the logging to INFO level, and do not present a stack trace, for 
such a trivial error.

 

See the attached file for a sample of what logging a single "table not found" 
query generates.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] hive pull request #329: HIVE-19089 : Create/Replicate AllocWriteId Event

2018-04-25 Thread maheshk114
Github user maheshk114 closed the pull request at:

https://github.com/apache/hive/pull/329


---


[jira] [Created] (HIVE-19301) fix hive script to not rely on path expression result to be 0 or 1 file

2018-04-25 Thread Zoltan Haindrich (JIRA)
Zoltan Haindrich created HIVE-19301:
---

 Summary: fix hive script to not rely on path expression result to 
be 0 or 1 file
 Key: HIVE-19301
 URL: https://issues.apache.org/jira/browse/HIVE-19301
 Project: Hive
  Issue Type: Bug
Reporter: Zoltan Haindrich


if everything goes well; the if matches 0 or 1 file - then its good ; however 
when it matches more than 1 ; there will be a "too many arguments" error


https://github.com/apache/hive/blob/10699bf1498b677a852c0faa1279d3c904151b73/bin/hive#L123




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19300) Skip Druid/JDBC rules in optimizer when there are no Druid/JDBC sources

2018-04-25 Thread Jesus Camacho Rodriguez (JIRA)
Jesus Camacho Rodriguez created HIVE-19300:
--

 Summary: Skip Druid/JDBC rules in optimizer when there are no 
Druid/JDBC sources
 Key: HIVE-19300
 URL: https://issues.apache.org/jira/browse/HIVE-19300
 Project: Hive
  Issue Type: Improvement
  Components: CBO, Druid integration, JDBC
Affects Versions: 3.0.0, 3.1.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


When there are no Druid/JDBC sources in a plan, we can skip the complete blocks 
that trigger rules to push computation to Druid/JDBC, since they will have no 
effect on the plan, and thus avoid the overhead of traversing the plan to try 
to find matchings.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: ptest queue

2018-04-25 Thread Peter Vary
I would vote for version 3. It would solve the big patch problem, and removes 
the unnecessary test runs too.

Thanks,
Peter

> On Apr 25, 2018, at 11:01 AM, Adam Szita  wrote:
> 
> Hi all,
> 
> I had a patch (HIVE-19077) committed with the original aim being the
> prevention of wasting resources when running ptest on the same patch
> multiple times:
> It is supposed to manage scenarios where a developer uploads
> HIVE-XYZ.1.patch, that gets queued in jenkins, then before execution
> HIVE-XYZ.2.patch (for the same jira) is uploaded and that gets queued also.
> When the first patch starts to execute ptest will see that patch2 is the
> latest patch and will use that. After some time the second queued job will
> also run on this very same patch.
> This is just pointless and causes long queues to progress slowly.
> 
> My idea was to remove these duplicates from the queue where I'd only keep
> the latest queued element if I see more queued entries for the same jira
> number. It's like when you go grocery shopping and you're already in line
> at cashier but you realise you also need e.g. milk. You go grab it and join
> the END of the queue. So I believe it's a fair punishment for losing one's
> spot in the queue for making amends on their patch.
> 
> That said Deepak made me realise that for big patches this will be very
> cumbersome due to the need of constant rebasing to avoid conflicts on patch
> application.
> I have three proposals now:
> 
> 1: Leave this as it currently is (with HIVE-19077 committed) - *only the
> latest queued job will run of the same jira*
> pros: no wasting resources to run the same patches more times, 'scheduling'
> is fair: if you amend you're patch you may loose your original spot in the
> queue
> cons: big patches that are prone to conflicts will be hard to get executed
> in ptest, devs will have to wait more time for their ptest results if they
> amend their patches
> 
> 2: *Add a safety switch* to this queue checking feature (currently proposed
> in HIVE-19077), deduplication can be switch off on request
> pros: same as 1st, + ability to have more control on this mechanism i.e.
> turn it off for big/urgent patches
> cons: big patches that use the swich might still waste resources, also devs
> might use safety switch inappropriately for their own evil benefit :)
> 
> 3: Deduplication the other way around - *only the first queued job will run
> of the same jira*, ptest server will keep record of patch names and won't
> execute a patch with a seen name and jira number again
> pros: same patches will not be executed more times accidentally, big
> patches won't be a problem either, devs will get their ptest result back
> earlier even if more jobs are triggered for same jira/patch name
> cons: scheduling is less fair: devs can reserve their spots in the queue
> 
> 
> (0: restore original: I'm strongly against this, ptest queue is already too
> big as it is, we have to at least try and decrease its size by
> deduplicating jiras in it)
> 
> I'm personally fine with any of the 1,2,3 methods listed above, with my
> favourites being 2 and 3.
> Let me know which one you think is the right path to go down on.
> 
> Thanks,
> Adam
> 
> On 20 April 2018 at 20:14, Eugene Koifman  wrote:
> 
>> Would it be possible to add patch name validation when it gets added to
>> the queue?
>> Currently I think it fails when the bot gets to the patch if it’s not
>> named correctly.
>> More  common for branch patches
>> 
>> On 4/20/18, 8:20 AM, "Zoltan Haindrich"  wrote:
>> 
>>Hello,
>> 
>>Some time ago the ptest queue worked the following way:
>> 
>>* for some reason ATTACHMENT_ID was not set by the upstream jira
>> scanner
>>tool; this  triggered a feature in Jenkins: if for the same ticket
>>mutliple patches were uploaded; they didn't triggered new runs
>> (because
>>the parameters were the same)
>>* this have become fixed at some point...around that time I started
>>getting multiple ptest executions for the same ticket - because I've
>>fixed a minor typo after submitting the first version of my patch...
>>* currently we also have a jenkins queue reader inside the ptest
>>job...which checks if the ticket is in the queue right now; and if is
>>it, it just exits...this logic kinda restores the earlier behaviour;
>>with the exception that if I upload a patch every day and the queue is
>>longer that 1day (like now); I will never get a ptest run :D
>>* ...now here I come! I've just removed my patch from yesterday;
>> because
>>I want a ptest run with my newest patch; and the only way to force the
>>above logic to do thatis by removing that attachment..
>> 
>> 
>>So...could we go back to the state when the attachment_id was ignored?
>>I would recommend to remove the ATTACHMENT_ID from the jenkins
>> parameters...
>> 
>>cheers,
>>Zoltan
>> 
>>JenkinsQueueUtil.java:

insert overwrite directory ... row format delimited fields terminated by ... problem with same separator in fields

2018-04-25 Thread ImMr.K
HI all,


insert overwrite directory ... row format delimited fields terminated by ',' 
select col1,col2 from...
If col2 separates with comma??the result file will only col1,[col2 words before 
first comma].
Use hive-1.1.0-cdh5.11.2.
Any advice or suggestions will be appreciated. Thanks.


Ze

[jira] [Created] (HIVE-19299) Add additional tests for pushing computation to JDBC storage handler

2018-04-25 Thread Jesus Camacho Rodriguez (JIRA)
Jesus Camacho Rodriguez created HIVE-19299:
--

 Summary: Add additional tests for pushing computation to JDBC 
storage handler
 Key: HIVE-19299
 URL: https://issues.apache.org/jira/browse/HIVE-19299
 Project: Hive
  Issue Type: Bug
  Components: JDBC, StorageHandler
Affects Versions: 3.0.0, 3.1.0
Reporter: Jesus Camacho Rodriguez


After HIVE-18423 has been pushed, we need to add more tests to asses the 
whether all rules are working properly and computation of different operators 
is being pushed through JDBC.

This includes extending test coverage for:
- Project with different expression types.
- Filter with different conditions.
- Join between JDBC sources and mixed.
- Aggregate.
- Union.
- Sort.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] hive pull request #288: HIVE-18423

2018-04-25 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/hive/pull/288


---


[jira] [Created] (HIVE-19298) Fix operator tree of CTAS for Druid Storage Handler

2018-04-25 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-19298:
-

 Summary: Fix operator tree of CTAS for Druid Storage Handler
 Key: HIVE-19298
 URL: https://issues.apache.org/jira/browse/HIVE-19298
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Reporter: slim bouguerra
Assignee: slim bouguerra
 Fix For: 3.1.0


Current operator plan of CTAS for Druid storage handler is broken when used 
enables the property \{code} hive.exec.parallel\{code} as \{code} true\{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19297) Hive on Tez - 2 jobs are not launched on same queue

2018-04-25 Thread Alexandre Linte (JIRA)
Alexandre Linte created HIVE-19297:
--

 Summary: Hive on Tez - 2 jobs are not launched on same queue
 Key: HIVE-19297
 URL: https://issues.apache.org/jira/browse/HIVE-19297
 Project: Hive
  Issue Type: Bug
 Environment: Hadoop : 2.7.4

Hive : 2.3.0

Tez : 0.8.4
Reporter: Alexandre Linte


Hello,

I Have a strange issue concerning Hive jobs launched with Tez executors.

I specify a tez queue before launch my hive job, it's okay for the first one, 
but If I launch another job, queue is not the same, keeping in mind i'm on the 
same hive session.

Issue appear on hive shell (i also specify ecexutor tez for this one) or 
beeline :
|[application_152165224_5056|https://knox-big-ech.itn.ftgroup/gateway/bigdata/yarn/cluster/app/application_1521652293224_509656]|user1|HIVE-33ba7068-1b50-494c-bc4b-11327f383d8f|TEZ|default|Wed
 Apr 25 15:48:10 +0200 2018|N/A|ACCEPTED|UNDEFINED| 
|[ApplicationMaster|https://knox-big-ech.itn.ftgroup/gateway/bigdata/yarn/proxy/application_1521652293224_509656]|0|
|[application_152163224_5092|https://knox-big-ech.itn.ftgroup/gateway/bigdata/yarn/cluster/app/application_1521652293224_509592]|user1|HIVE-c34b6f99-c0e8-4d4a-bef4-6c1f1d4addbd|TEZ|HQ_XXX|Wed
 Apr 25 15:39:30 +0200 2018|Wed Apr 25 15:45:47 +0200 2018|FINISHED|SUCCEEDED| 
|[History|https://knox-big-ech.itn.ftgroup/gateway/bigdata/yarn/proxy/application_1521652293224_509592]|N/A|

 

Best Regards.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19296) Add log to record MapredLocalTask Failure

2018-04-25 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-19296:
---

 Summary: Add log to record MapredLocalTask Failure
 Key: HIVE-19296
 URL: https://issues.apache.org/jira/browse/HIVE-19296
 Project: Hive
  Issue Type: Bug
  Components: Diagnosability
Affects Versions: 1.1.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


In some cases, When MapredLocalTask fails around Child process start time, we 
can not find the detail error information anywhere(not in strerr log, no 
MapredLocal log file). All we get is :
{noformat}
*** ERROR org.apache.hadoop.hive.ql.exec.Task: 
[HiveServer2-Background-Pool: Thread-]: Execution failed with exit status: 1
*** ERROR org.apache.hadoop.hive.ql.exec.Task: 
[HiveServer2-Background-Pool: Thread-]: Obtaining error information
*** ERROR org.apache.hadoop.hive.ql.exec.Task: 
[HiveServer2-Background-Pool: Thread-]: 
Task failed!
Task ID:
  Stage-48

Logs:

*** ERROR org.apache.hadoop.hive.ql.exec.Task: 
[HiveServer2-Background-Pool: Thread-]: 
/var/log/hive/hadoop-cmf-hive1-HIVESERVER2-t.log.out
*** ERROR org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask: 
[HiveServer2-Background-Pool: Thread-]: Execution failed with exit status: 1
{noformat}
It is really hard to debug. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: how to extract metadata of hive tables in speed

2018-04-25 Thread Peter Vary
Thanks for the info too! :)

> On Apr 25, 2018, at 11:12 AM, 侯宗田  wrote:
> 
> Hi,
> 
> Thank you, I have looked up the source code of Hcatalog, it seems every time 
> when I run hcat -e “query”, it called hcatCli, then it make configuration, 
> create and start a session, then dump it after being used. It can’t keep a 
> session or connection and don’t have a Cli. The initialization take all the 
> time. Therefore, I only can use the thrift API to do my job. Thank you for 
> your precious suggestions!
> 
> Best regards,
> Hou
>> 在 2018年4月24日,下午7:45,Peter Vary  写道:
>> 
>> Hi Hou,
>> 
>> Kudu uses the Thrift HMS interface, and written in C. An example could be 
>> found here:
>> https://github.com/apache/kudu/tree/master/src/kudu/hms 
>> 
>> 
>> As for parametrizing Hcatalog I have only found this:
>> https://cwiki.apache.org/confluence/display/Hive/HCatalog+Configuration+Properties
>>  
>> 
>> But have not find anything there which might help you there.
>> 
>> Peter
>> 
>>> On Apr 24, 2018, at 10:51 AM, 侯宗田  wrote:
>>> 
>>> Hi, Peter:
>>> I have started a standalone metastore server and it indeed short that part 
>>> of time, it does connection instead of initialization. But I still have 
>>> some questions,
>>> First, I believe the Hcatalog must be quick because it is a mature product 
>>> and I have not seen others complaining about this problem, is there some 
>>> configuration which controls starting new session or how to keep a session 
>>> connected to the HMS, in the log below it started a new session and 
>>> connected twice. 
>>> Second, I am very interested in using the HMS thrift API, but I could not 
>>> found an example of how to use it in C/C++ to access hive table info. Do 
>>> you know some link about it?
>>> Really thank you for your time!!
>>> 
>>> Best regards,
>>> Hou
>>> 
>>> $time ./hcat.py -e "use default; show table extended like haha;"
>>> 18/04/24 15:47:08 INFO conf.HiveConf: Found configuration file 
>>> file:/usr/local/hive/conf/hive-site.xml
>>> 18/04/24 15:47:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
>>> library for your platform... using builtin-java classes where applicable
>>> 18/04/24 15:47:10 INFO session.SessionState: Created HDFS directory: 
>>> /tmp/hive/kousouda/6c7e97ad-c9dd-4c5e-9636-ab9d4e47d76f
>>> 18/04/24 15:47:10 INFO session.SessionState: Created local directory: 
>>> /tmp/hive/java/kousouda/6c7e97ad-c9dd-4c5e-9636-ab9d4e47d76f
>>> 18/04/24 15:47:10 INFO session.SessionState: Created HDFS directory: 
>>> /tmp/hive/kousouda/6c7e97ad-c9dd-4c5e-9636-ab9d4e47d76f/_tmp_space.db
>>> 18/04/24 15:47:10 INFO ql.Driver: Compiling 
>>> command(queryId=kousouda_20180424154710_e0443fb2-3930-4dc3-9965-25a9f98807a5):
>>>  use default
>>> 18/04/24 15:47:12 INFO hive.metastore: Trying to connect to metastore with 
>>> URI thrift://localhost:9083
>>> 18/04/24 15:47:12 INFO hive.metastore: Opened a connection to metastore, 
>>> current connections: 1
>>> 18/04/24 15:47:12 INFO hive.metastore: Connected to metastore.
>>> 18/04/24 15:47:12 INFO ql.Driver: Semantic Analysis Completed
>>> 18/04/24 15:47:12 INFO ql.Driver: Returning Hive schema: 
>>> Schema(fieldSchemas:null, properties:null)
>>> 18/04/24 15:47:12 INFO ql.Driver: Completed compiling 
>>> command(queryId=kousouda_20180424154710_e0443fb2-3930-4dc3-9965-25a9f98807a5);
>>>  Time taken: 1.591 seconds
>>> 18/04/24 15:47:12 INFO ql.Driver: Concurrency mode is disabled, not 
>>> creating a lock manager
>>> 18/04/24 15:47:12 INFO ql.Driver: Executing 
>>> command(queryId=kousouda_20180424154710_e0443fb2-3930-4dc3-9965-25a9f98807a5):
>>>  use default
>>> 18/04/24 15:47:12 INFO sqlstd.SQLStdHiveAccessController: Created 
>>> SQLStdHiveAccessController for session context : HiveAuthzSessionContext 
>>> [sessionString=6c7e97ad-c9dd-4c5e-9636-ab9d4e47d76f, clientType=HIVECLI]
>>> 18/04/24 15:47:12 WARN session.SessionState: METASTORE_FILTER_HOOK will be 
>>> ignored, since hive.security.authorization.manager is set to instance of 
>>> HiveAuthorizerFactory.
>>> 18/04/24 15:47:12 INFO hive.metastore: Mestastore configuration 
>>> hive.metastore.filter.hook changed from 
>>> org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl to 
>>> org.apache.hadoop.hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook
>>> 18/04/24 15:47:12 INFO hive.metastore: Closed a connection to metastore, 
>>> current connections: 0
>>> 18/04/24 15:47:12 INFO hive.metastore: Trying to connect to metastore with 
>>> URI thrift://localhost:9083
>>> 18/04/24 15:47:12 INFO hive.metastore: Opened a connection to metastore, 
>>> current connections: 1
>>> 18/04/24 15:47:12 INFO hive.metastore: Connected to metastore.
>>> 18/04/24 15:47:12 INFO ql.Driver: Starting task [Stage-0:DDL] in serial mode
>>> 18/04/24 

[jira] [Created] (HIVE-19295) Some multiple inserts do work on MR engine

2018-04-25 Thread Oleksiy Sayankin (JIRA)
Oleksiy Sayankin created HIVE-19295:
---

 Summary: Some multiple inserts do work on MR engine
 Key: HIVE-19295
 URL: https://issues.apache.org/jira/browse/HIVE-19295
 Project: Hive
  Issue Type: Bug
Reporter: Oleksiy Sayankin
Assignee: Oleksiy Sayankin


*General Info*

Hive version : 2.3.3

{code}
commit 3f7dde31aed44b5440563d3f9d8a8887beccf0be
Author: Daniel Dai 
Date:   Wed Mar 28 16:46:29 2018 -0700

Preparing for 2.3.3 release

{code}

Hadoop version: 2.7.2.

Engine

{code}
hive> set hive.execution.engine;
hive.execution.engine=mr
{code}

*STEP 1. Create test data*

{code}
DROP TABLE IF EXISTS customer_target;
DROP TABLE IF EXISTS customer_source;
{code}

{code}
CREATE TABLE customer_target (id STRING, first_name STRING, last_name STRING, 
age INT); 
{code}

{code}
insert into customer_target values ('001', 'John', 'Smith', 45), ('002', 
'Michael', 'Watson', 27), ('003', 'Den', 'Brown', 33);
SELECT id, first_name, last_name, age  FROM customer_target;
{code}

{code}
+--+-++--+
|  id  | first_name  | last_name  | age  |
+--+-++--+
| 002  | Michael | Watson | 27   |
| 001  | John| Smith  | 45   |
| 003  | Den | Brown  | 33   |
+--+-++--+
{code}



{code}
CREATE TABLE customer_source (id STRING, first_name STRING, last_name STRING, 
age INT);

insert into customer_source values ('001', 'Dorothi', 'Hogward', 77), ('007', 
'Alex', 'Bowee', 1), ('088', 'Robert', 'Dowson', 25);
SELECT id, first_name, last_name, age  FROM customer_source;
{code}

{code}
+--+-++--+
|  id  | first_name  | last_name  | age  |
+--+-++--+
| 088  | Robert  | Dowson | 25   |
| 001  | Dorothi | Hogward| 77   |
| 007  | Alex| Bowee  | 1|
+--+-++--+
{code}

*STEP 2. Do multiple insert*

{code}
FROM
  `default`.`customer_target` `trg`
  JOIN
  `default`.`customer_source` `src`
  ON `src`.`id` = `trg`.`id`
INSERT INTO `default`.`customer_target`-- update clause
 select  `trg`.`id`, `src`.`first_name`, `src`.`last_name`, `trg`.`age`
   WHERE `src`.`id` = `trg`.`id`
 sort by `trg`.id 
INSERT INTO `default`.`customer_target`-- insert clause
  select `src`.`id`, `src`.`first_name`, `src`.`last_name`, `src`.`age`
   WHERE `trg`.`id` IS NULL   
{code}


*ACTUAL RESULT*

{code}
FAILED: SemanticException [Error 10087]: The same output cannot be present 
multiple times:  customer_target
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: how to extract metadata of hive tables in speed

2018-04-25 Thread 侯宗田
Hi,

Thank you, I have looked up the source code of Hcatalog, it seems every time 
when I run hcat -e “query”, it called hcatCli, then it make configuration, 
create and start a session, then dump it after being used. It can’t keep a 
session or connection and don’t have a Cli. The initialization take all the 
time. Therefore, I only can use the thrift API to do my job. Thank you for your 
precious suggestions!

Best regards,
Hou
> 在 2018年4月24日,下午7:45,Peter Vary  写道:
> 
> Hi Hou,
> 
> Kudu uses the Thrift HMS interface, and written in C. An example could be 
> found here:
> https://github.com/apache/kudu/tree/master/src/kudu/hms 
> 
> 
> As for parametrizing Hcatalog I have only found this:
> https://cwiki.apache.org/confluence/display/Hive/HCatalog+Configuration+Properties
>  
> 
> But have not find anything there which might help you there.
> 
> Peter
> 
>> On Apr 24, 2018, at 10:51 AM, 侯宗田  wrote:
>> 
>> Hi, Peter:
>> I have started a standalone metastore server and it indeed short that part 
>> of time, it does connection instead of initialization. But I still have some 
>> questions,
>> First, I believe the Hcatalog must be quick because it is a mature product 
>> and I have not seen others complaining about this problem, is there some 
>> configuration which controls starting new session or how to keep a session 
>> connected to the HMS, in the log below it started a new session and 
>> connected twice. 
>> Second, I am very interested in using the HMS thrift API, but I could not 
>> found an example of how to use it in C/C++ to access hive table info. Do you 
>> know some link about it?
>> Really thank you for your time!!
>> 
>> Best regards,
>> Hou
>> 
>> $time ./hcat.py -e "use default; show table extended like haha;"
>> 18/04/24 15:47:08 INFO conf.HiveConf: Found configuration file 
>> file:/usr/local/hive/conf/hive-site.xml
>> 18/04/24 15:47:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
>> library for your platform... using builtin-java classes where applicable
>> 18/04/24 15:47:10 INFO session.SessionState: Created HDFS directory: 
>> /tmp/hive/kousouda/6c7e97ad-c9dd-4c5e-9636-ab9d4e47d76f
>> 18/04/24 15:47:10 INFO session.SessionState: Created local directory: 
>> /tmp/hive/java/kousouda/6c7e97ad-c9dd-4c5e-9636-ab9d4e47d76f
>> 18/04/24 15:47:10 INFO session.SessionState: Created HDFS directory: 
>> /tmp/hive/kousouda/6c7e97ad-c9dd-4c5e-9636-ab9d4e47d76f/_tmp_space.db
>> 18/04/24 15:47:10 INFO ql.Driver: Compiling 
>> command(queryId=kousouda_20180424154710_e0443fb2-3930-4dc3-9965-25a9f98807a5):
>>  use default
>> 18/04/24 15:47:12 INFO hive.metastore: Trying to connect to metastore with 
>> URI thrift://localhost:9083
>> 18/04/24 15:47:12 INFO hive.metastore: Opened a connection to metastore, 
>> current connections: 1
>> 18/04/24 15:47:12 INFO hive.metastore: Connected to metastore.
>> 18/04/24 15:47:12 INFO ql.Driver: Semantic Analysis Completed
>> 18/04/24 15:47:12 INFO ql.Driver: Returning Hive schema: 
>> Schema(fieldSchemas:null, properties:null)
>> 18/04/24 15:47:12 INFO ql.Driver: Completed compiling 
>> command(queryId=kousouda_20180424154710_e0443fb2-3930-4dc3-9965-25a9f98807a5);
>>  Time taken: 1.591 seconds
>> 18/04/24 15:47:12 INFO ql.Driver: Concurrency mode is disabled, not creating 
>> a lock manager
>> 18/04/24 15:47:12 INFO ql.Driver: Executing 
>> command(queryId=kousouda_20180424154710_e0443fb2-3930-4dc3-9965-25a9f98807a5):
>>  use default
>> 18/04/24 15:47:12 INFO sqlstd.SQLStdHiveAccessController: Created 
>> SQLStdHiveAccessController for session context : HiveAuthzSessionContext 
>> [sessionString=6c7e97ad-c9dd-4c5e-9636-ab9d4e47d76f, clientType=HIVECLI]
>> 18/04/24 15:47:12 WARN session.SessionState: METASTORE_FILTER_HOOK will be 
>> ignored, since hive.security.authorization.manager is set to instance of 
>> HiveAuthorizerFactory.
>> 18/04/24 15:47:12 INFO hive.metastore: Mestastore configuration 
>> hive.metastore.filter.hook changed from 
>> org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl to 
>> org.apache.hadoop.hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook
>> 18/04/24 15:47:12 INFO hive.metastore: Closed a connection to metastore, 
>> current connections: 0
>> 18/04/24 15:47:12 INFO hive.metastore: Trying to connect to metastore with 
>> URI thrift://localhost:9083
>> 18/04/24 15:47:12 INFO hive.metastore: Opened a connection to metastore, 
>> current connections: 1
>> 18/04/24 15:47:12 INFO hive.metastore: Connected to metastore.
>> 18/04/24 15:47:12 INFO ql.Driver: Starting task [Stage-0:DDL] in serial mode
>> 18/04/24 15:47:12 INFO ql.Driver: Completed executing 
>> command(queryId=kousouda_20180424154710_e0443fb2-3930-4dc3-9965-25a9f98807a5);
>>  Time taken: 0.119 seconds
>> OK
>> 18/04/24 15:47:12 INFO ql.Driver: OK
>> 

Re: ptest queue

2018-04-25 Thread Adam Szita
Hi all,

I had a patch (HIVE-19077) committed with the original aim being the
prevention of wasting resources when running ptest on the same patch
multiple times:
It is supposed to manage scenarios where a developer uploads
HIVE-XYZ.1.patch, that gets queued in jenkins, then before execution
HIVE-XYZ.2.patch (for the same jira) is uploaded and that gets queued also.
When the first patch starts to execute ptest will see that patch2 is the
latest patch and will use that. After some time the second queued job will
also run on this very same patch.
This is just pointless and causes long queues to progress slowly.

My idea was to remove these duplicates from the queue where I'd only keep
the latest queued element if I see more queued entries for the same jira
number. It's like when you go grocery shopping and you're already in line
at cashier but you realise you also need e.g. milk. You go grab it and join
the END of the queue. So I believe it's a fair punishment for losing one's
spot in the queue for making amends on their patch.

That said Deepak made me realise that for big patches this will be very
cumbersome due to the need of constant rebasing to avoid conflicts on patch
application.
I have three proposals now:

1: Leave this as it currently is (with HIVE-19077 committed) - *only the
latest queued job will run of the same jira*
pros: no wasting resources to run the same patches more times, 'scheduling'
is fair: if you amend you're patch you may loose your original spot in the
queue
cons: big patches that are prone to conflicts will be hard to get executed
in ptest, devs will have to wait more time for their ptest results if they
amend their patches

2: *Add a safety switch* to this queue checking feature (currently proposed
in HIVE-19077), deduplication can be switch off on request
pros: same as 1st, + ability to have more control on this mechanism i.e.
turn it off for big/urgent patches
cons: big patches that use the swich might still waste resources, also devs
might use safety switch inappropriately for their own evil benefit :)

3: Deduplication the other way around - *only the first queued job will run
of the same jira*, ptest server will keep record of patch names and won't
execute a patch with a seen name and jira number again
pros: same patches will not be executed more times accidentally, big
patches won't be a problem either, devs will get their ptest result back
earlier even if more jobs are triggered for same jira/patch name
cons: scheduling is less fair: devs can reserve their spots in the queue


(0: restore original: I'm strongly against this, ptest queue is already too
big as it is, we have to at least try and decrease its size by
deduplicating jiras in it)

I'm personally fine with any of the 1,2,3 methods listed above, with my
favourites being 2 and 3.
Let me know which one you think is the right path to go down on.

Thanks,
Adam

On 20 April 2018 at 20:14, Eugene Koifman  wrote:

> Would it be possible to add patch name validation when it gets added to
> the queue?
> Currently I think it fails when the bot gets to the patch if it’s not
> named correctly.
> More  common for branch patches
>
> On 4/20/18, 8:20 AM, "Zoltan Haindrich"  wrote:
>
> Hello,
>
> Some time ago the ptest queue worked the following way:
>
> * for some reason ATTACHMENT_ID was not set by the upstream jira
> scanner
> tool; this  triggered a feature in Jenkins: if for the same ticket
> mutliple patches were uploaded; they didn't triggered new runs
> (because
> the parameters were the same)
> * this have become fixed at some point...around that time I started
> getting multiple ptest executions for the same ticket - because I've
> fixed a minor typo after submitting the first version of my patch...
> * currently we also have a jenkins queue reader inside the ptest
> job...which checks if the ticket is in the queue right now; and if is
> it, it just exits...this logic kinda restores the earlier behaviour;
> with the exception that if I upload a patch every day and the queue is
> longer that 1day (like now); I will never get a ptest run :D
> * ...now here I come! I've just removed my patch from yesterday;
> because
> I want a ptest run with my newest patch; and the only way to force the
> above logic to do thatis by removing that attachment..
>
>
> So...could we go back to the state when the attachment_id was ignored?
> I would recommend to remove the ATTACHMENT_ID from the jenkins
> parameters...
>
> cheers,
> Zoltan
>
> JenkinsQueueUtil.java:
> https://github.com/apache/hive/blob/f8a671d8cfe8a26d1d12c51f93207e
> c92577c796/testutils/ptest2/src/main/java/org/apache/hive/
> ptest/api/client/JenkinsQueueUtil.java#L82
>
>
>
>


Re: Review Request 66774: HIVE-19285: Add logs to the subclasses of MetaDataOperation

2018-04-25 Thread Peter Vary via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66774/#review201906
---


Ship it!




Ship It!

- Peter Vary


On April 24, 2018, 6:35 p.m., Marta Kuczora wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66774/
> ---
> 
> (Updated April 24, 2018, 6:35 p.m.)
> 
> 
> Review request for hive and Peter Vary.
> 
> 
> Bugs: HIVE-19285
> https://issues.apache.org/jira/browse/HIVE-19285
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Subclasses of MetaDataOperation are not writing anything to the logs. It 
> would be useful to have some INFO and DEBUG level logging in these classes.
> 
> 
> Diffs
> -
> 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetCatalogsOperation.java
>  7944467 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetColumnsOperation.java
>  d67ea90 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetCrossReferenceOperation.java
>  99ccd4e 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetFunctionsOperation.java
>  091bf50 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetPrimaryKeysOperation.java
>  e603fdd 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetSchemasOperation.java
>  de09ec9 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetTableTypesOperation.java
>  59cfbb2 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetTablesOperation.java
>  c9233d0 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetTypeInfoOperation.java
>  ac078b4 
>   
> service/src/java/org/apache/hive/service/cli/operation/MetadataOperation.java 
> bf7c021 
> 
> 
> Diff: https://reviews.apache.org/r/66774/diff/2/
> 
> 
> Testing
> ---
> 
> Just adding some additional log messages. Tested locally by checking the log 
> messages for different use cases
> 
> 
> Thanks,
> 
> Marta Kuczora
> 
>



[jira] [Created] (HIVE-19294) grouping sets when contains a constant column

2018-04-25 Thread Song Jun (JIRA)
Song Jun created HIVE-19294:
---

 Summary: grouping sets when contains a constant column
 Key: HIVE-19294
 URL: https://issues.apache.org/jira/browse/HIVE-19294
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 2.3.2
Reporter: Song Jun


We have different results between Hive-1.2.2 and Hive-2.3.2, SQL like this:
{code:java}
select 
case when a='all' then 'x' 
 when b=1 then 'y' 
 else 'z' 
end, c 
from ( 
select 
 a,b,count(1) as c
from ( 
 select 
'all' as a,b 
 from test 
) t1 group by a,b grouping sets(a,b) 
) t2;
{code}
We have a grouping sets using the column a which is a contant value 'all' in 
its subquery.

 

The result of Hive 1.2.2(same result when set hive.cbo.enable to true or false):
{code:java}
x 3
y 2
z 1   {code}
The result of Hive 2.3.2(same result when set hive.cbo.enable to true or false):
{code:java}
x 3
x 2
x 1{code}
I dig it out for Hive 2.3.2 and set hive.cbo.enable=false, I found it that the 
optimizer 

ConstantPropagate optimize the code according to the constant column value 
'all' in the subquery:
{code:java}
case when a='all' then 'x' 
 when b=1 then 'y' 
 else 'z' 
end
{code}
to 
{code:java}
Select Operator
expressions: CASE WHEN (true) THEN ('x') WHEN ((_col1 = 1)) THEN ('y') ELSE 
('z') END (type: string), _col3 (type: bigint)
outputColumnNames: _col0, _col1
Statistics: Num rows: 3 Data size: 3 Basic stats: COMPLETE Column stats: NONE
{code}
That is case when a = 'all'  explained as case when (true), so we always has 
the value of 'x'.

 

So, which should be right for the above query case? 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 66567: Migrate to Murmur hash for shuffle and bucketing

2018-04-25 Thread Deepak Jaiswal


> On April 14, 2018, 1:13 a.m., Jason Dere wrote:
> > hbase-handler/src/test/results/positive/external_table_ppd.q.out
> > Lines 59 (patched)
> > 
> >
> > Are there any tests for the old-style bucketing, to make sure that 
> > previously created bucketed tables still work properly?
> 
> Deepak Jaiswal wrote:
> That is a good point. Will work on it.

TestAcidOnTez#testMapJoinOnTez is one such test.


- Deepak


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66567/#review201133
---


On April 25, 2018, 7:21 a.m., Deepak Jaiswal wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66567/
> ---
> 
> (Updated April 25, 2018, 7:21 a.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan, Eugene Koifman, Jason Dere, and 
> Matt McCline.
> 
> 
> Bugs: HIVE-18910
> https://issues.apache.org/jira/browse/HIVE-18910
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Hive uses JAVA hash which is not as good as murmur for better distribution 
> and efficiency in bucketing a table.
> Migrate to murmur hash but still keep backward compatibility for existing 
> users so that they dont have to reload the existing tables.
> 
> To keep backward compatibility, bucket_version is added as a table property, 
> resulting in high number of result updates.
> 
> 
> Diffs
> -
> 
>   hbase-handler/src/test/results/positive/external_table_ppd.q.out cdc43ee560 
>   hbase-handler/src/test/results/positive/hbase_binary_storage_queries.q.out 
> 153613e6d0 
>   hbase-handler/src/test/results/positive/hbase_ddl.q.out ef3f5f704e 
>   hbase-handler/src/test/results/positive/hbasestats.q.out 5d000d2f4f 
>   
> hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/AbstractRecordWriter.java
>  924e233293 
>   
> hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/TestStreaming.java
>  fe2b1c1f3c 
>   
> hcatalog/webhcat/java-client/src/main/java/org/apache/hive/hcatalog/api/HCatTable.java
>  996329195c 
>   
> hcatalog/webhcat/java-client/src/test/java/org/apache/hive/hcatalog/api/TestHCatClient.java
>  f9ee9d9a03 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_into_dynamic_partitions.q.out
>  caa00292b8 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_into_table.q.out 
> ab8ad77074 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_directory.q.out
>  2b28a6677e 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_dynamic_partitions.q.out
>  cdb67dd786 
>   
> itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_table.q.out
>  2c23a7e94f 
>   
> itests/hive-blobstore/src/test/results/clientpositive/write_final_output_blobstore.q.out
>  a1be085ea5 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
>  82ba775286 
>   itests/src/test/resources/testconfiguration.properties 2c1a76d89b 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java c084fa054c 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java d59bf1fb6e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java c28ef99621 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 21ca04d78a 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java 
> d4363fdf91 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 6395c31ec7 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/keyseries/VectorKeySeriesSerializedImpl.java
>  86f466fc4e 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/reducesink/VectorReduceSinkCommonOperator.java
>  4077552a56 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/reducesink/VectorReduceSinkObjectHashOperator.java
>  1bc3fdabac 
>   ql/src/java/org/apache/hadoop/hive/ql/io/HiveFileFormatUtils.java 
> 71498a125c 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java dc6cc62fbb 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java a51fdd322f 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java 
> 7121bceb22 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/FixedBucketPruningOptimizer.java
>  5f65f638ca 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/PrunerOperatorFactory.java 
> 2be3c9b9a2 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java
>  1c5656267d 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionTimeGranularityOptimizer.java
>  0e995d79d2 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 
> 068f25e75f 
>   

Re: Review Request 66567: Migrate to Murmur hash for shuffle and bucketing

2018-04-25 Thread Deepak Jaiswal

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66567/
---

(Updated April 25, 2018, 7:21 a.m.)


Review request for hive, Ashutosh Chauhan, Eugene Koifman, Jason Dere, and Matt 
McCline.


Changes
---

Removed bucketingVersion from Optraits.
Removed the config to create table using old bucketing version. Users can 
always do it by setting the table property.
Cleaned up some old code related to BinarySortSerDe which was still there.


Bugs: HIVE-18910
https://issues.apache.org/jira/browse/HIVE-18910


Repository: hive-git


Description
---

Hive uses JAVA hash which is not as good as murmur for better distribution and 
efficiency in bucketing a table.
Migrate to murmur hash but still keep backward compatibility for existing users 
so that they dont have to reload the existing tables.

To keep backward compatibility, bucket_version is added as a table property, 
resulting in high number of result updates.


Diffs (updated)
-

  hbase-handler/src/test/results/positive/external_table_ppd.q.out cdc43ee560 
  hbase-handler/src/test/results/positive/hbase_binary_storage_queries.q.out 
153613e6d0 
  hbase-handler/src/test/results/positive/hbase_ddl.q.out ef3f5f704e 
  hbase-handler/src/test/results/positive/hbasestats.q.out 5d000d2f4f 
  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/AbstractRecordWriter.java
 924e233293 
  
hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/TestStreaming.java
 fe2b1c1f3c 
  
hcatalog/webhcat/java-client/src/main/java/org/apache/hive/hcatalog/api/HCatTable.java
 996329195c 
  
hcatalog/webhcat/java-client/src/test/java/org/apache/hive/hcatalog/api/TestHCatClient.java
 f9ee9d9a03 
  
itests/hive-blobstore/src/test/results/clientpositive/insert_into_dynamic_partitions.q.out
 caa00292b8 
  itests/hive-blobstore/src/test/results/clientpositive/insert_into_table.q.out 
ab8ad77074 
  
itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_directory.q.out
 2b28a6677e 
  
itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_dynamic_partitions.q.out
 cdb67dd786 
  
itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_table.q.out
 2c23a7e94f 
  
itests/hive-blobstore/src/test/results/clientpositive/write_final_output_blobstore.q.out
 a1be085ea5 
  
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
 82ba775286 
  itests/src/test/resources/testconfiguration.properties 2c1a76d89b 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java c084fa054c 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java d59bf1fb6e 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java c28ef99621 
  ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 21ca04d78a 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java d4363fdf91 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 6395c31ec7 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/keyseries/VectorKeySeriesSerializedImpl.java
 86f466fc4e 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/reducesink/VectorReduceSinkCommonOperator.java
 4077552a56 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/reducesink/VectorReduceSinkObjectHashOperator.java
 1bc3fdabac 
  ql/src/java/org/apache/hadoop/hive/ql/io/HiveFileFormatUtils.java 71498a125c 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java dc6cc62fbb 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java a51fdd322f 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java 
7121bceb22 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/FixedBucketPruningOptimizer.java
 5f65f638ca 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/PrunerOperatorFactory.java 
2be3c9b9a2 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java
 1c5656267d 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionTimeGranularityOptimizer.java
 0e995d79d2 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 
068f25e75f 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java a00f9279c0 
  ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java dde20ed56e 
  ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java aa3c72bc6d 
  ql/src/java/org/apache/hadoop/hive/ql/plan/TableDesc.java 25b91899de 
  ql/src/java/org/apache/hadoop/hive/ql/plan/VectorReduceSinkDesc.java 
adea3b53a9 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFHash.java 
7cd571815d 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFMurmurHash.java 
PRE-CREATION 
  ql/src/test/org/apache/hadoop/hive/ql/TestTxnAddPartition.java 7f7bc11410 
  ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands.java 12d57c6feb