[jira] [Commented] (HIVE-21240) JSON SerDe Re-Write

2019-05-08 Thread slim bouguerra (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835734#comment-16835734
 ] 

slim bouguerra commented on HIVE-21240:
---

+1 thanks [~belugabehr] pushed to master via [commit 
2b7e99708f5|https://git-wip-us.apache.org/repos/asf?p=hive.git;a=commit;h=2b7e99708f546fa7dbc4b6dfd3a9fcfe69b186dd]

> JSON SerDe Re-Write
> ---
>
> Key: HIVE-21240
> URL: https://issues.apache.org/jira/browse/HIVE-21240
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 4.0.0, 3.1.1
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21240.1.patch, HIVE-21240.1.patch, 
> HIVE-21240.10.patch, HIVE-21240.11.patch, HIVE-21240.12.patch, 
> HIVE-21240.12.patch, HIVE-21240.12.patch, HIVE-21240.12.patch, 
> HIVE-21240.12.patch, HIVE-21240.2.patch, HIVE-21240.3.patch, 
> HIVE-21240.4.patch, HIVE-21240.5.patch, HIVE-21240.6.patch, 
> HIVE-21240.7.patch, HIVE-21240.9.patch, HIVE-24240.8.patch, 
> kafka_storage_handler.diff
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> The JSON SerDe has a few issues, I will link them to this JIRA.
> * Use Jackson Tree parser instead of manually parsing
> * Added support for base-64 encoded data (the expected format when using JSON)
> * Added support to skip blank lines (returns all columns as null values)
> * Current JSON parser accepts, but does not apply, custom timestamp formats 
> in most cases
> * Added some unit tests
> * Added cache for column-name to column-index searches, currently O\(n\) for 
> each row processed, for each column in the row



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21240) JSON SerDe Re-Write

2019-05-08 Thread slim bouguerra (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-21240:
--
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

https://git-wip-us.apache.org/repos/asf?p=hive.git;a=commit;h=2b7e99708f546fa7dbc4b6dfd3a9fcfe69b186dd

> JSON SerDe Re-Write
> ---
>
> Key: HIVE-21240
> URL: https://issues.apache.org/jira/browse/HIVE-21240
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 4.0.0, 3.1.1
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21240.1.patch, HIVE-21240.1.patch, 
> HIVE-21240.10.patch, HIVE-21240.11.patch, HIVE-21240.12.patch, 
> HIVE-21240.12.patch, HIVE-21240.12.patch, HIVE-21240.12.patch, 
> HIVE-21240.12.patch, HIVE-21240.2.patch, HIVE-21240.3.patch, 
> HIVE-21240.4.patch, HIVE-21240.5.patch, HIVE-21240.6.patch, 
> HIVE-21240.7.patch, HIVE-21240.9.patch, HIVE-24240.8.patch, 
> kafka_storage_handler.diff
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> The JSON SerDe has a few issues, I will link them to this JIRA.
> * Use Jackson Tree parser instead of manually parsing
> * Added support for base-64 encoded data (the expected format when using JSON)
> * Added support to skip blank lines (returns all columns as null values)
> * Current JSON parser accepts, but does not apply, custom timestamp formats 
> in most cases
> * Added some unit tests
> * Added cache for column-name to column-index searches, currently O\(n\) for 
> each row processed, for each column in the row



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21686) Brute Force eviction can lead to a random uncontrolled eviction pattern.

2019-05-08 Thread slim bouguerra (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-21686:
--
Description: 
Current logic used by brute force eviction can lead to a perpetual random 
eviction pattern.
For instance if the cache build a small pocket of free memory where the total 
size is greater than incoming allocation request, the allocator will randomly 
evict block that fits a particular size.
This can happen over and over therefore all the eviction will be random.
In Addition this random eviction will lead a leak in the linked list maintained 
by the policy since it does not know anymore about what is evicted and what not.

The improvement of this patch is very substantial  to TPC-DS benchmark. I have 
tested it with 10TB scale 9 llap nodes and 32GB cache size per node.  The patch 
has showed very noticeable difference in the Hit rate for raw number  
[^Cache_hitrate_improvement.csv] 


  was:
Current logic used by brute force eviction can lead to a perpetual random 
eviction pattern.
For instance if the cache build a small pocket of free memory where the total 
size is greater than incoming allocation request, the allocator will randomly 
evict block that fits a particular size.
This can happen over and over therefore all the eviction will be random.
In Addition this random eviction will lead a leak in the linked list maintained 
by the policy since it does not know anymore about what is evicted and what not.



> Brute Force eviction can lead to a random uncontrolled eviction pattern.
> 
>
> Key: HIVE-21686
> URL: https://issues.apache.org/jira/browse/HIVE-21686
> Project: Hive
>  Issue Type: Bug
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>Priority: Major
> Attachments: Cache_hitrate_improvement.csv, HIVE-21686.2.patch, 
> HIVE-21686.patch
>
>
> Current logic used by brute force eviction can lead to a perpetual random 
> eviction pattern.
> For instance if the cache build a small pocket of free memory where the total 
> size is greater than incoming allocation request, the allocator will randomly 
> evict block that fits a particular size.
> This can happen over and over therefore all the eviction will be random.
> In Addition this random eviction will lead a leak in the linked list 
> maintained by the policy since it does not know anymore about what is evicted 
> and what not.
> The improvement of this patch is very substantial  to TPC-DS benchmark. I 
> have tested it with 10TB scale 9 llap nodes and 32GB cache size per node.  
> The patch has showed very noticeable difference in the Hit rate for raw 
> number  [^Cache_hitrate_improvement.csv] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21686) Brute Force eviction can lead to a random uncontrolled eviction pattern.

2019-05-08 Thread slim bouguerra (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-21686:
--
Attachment: Cache_hitrate_improvement.csv

> Brute Force eviction can lead to a random uncontrolled eviction pattern.
> 
>
> Key: HIVE-21686
> URL: https://issues.apache.org/jira/browse/HIVE-21686
> Project: Hive
>  Issue Type: Bug
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>Priority: Major
> Attachments: Cache_hitrate_improvement.csv, HIVE-21686.2.patch, 
> HIVE-21686.patch
>
>
> Current logic used by brute force eviction can lead to a perpetual random 
> eviction pattern.
> For instance if the cache build a small pocket of free memory where the total 
> size is greater than incoming allocation request, the allocator will randomly 
> evict block that fits a particular size.
> This can happen over and over therefore all the eviction will be random.
> In Addition this random eviction will lead a leak in the linked list 
> maintained by the policy since it does not know anymore about what is evicted 
> and what not.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21686) Brute Force eviction can lead to a random uncontrolled eviction pattern.

2019-05-08 Thread slim bouguerra (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835970#comment-16835970
 ] 

slim bouguerra commented on HIVE-21686:
---

[~t3rmin4t0r] can you please take a look at this.

> Brute Force eviction can lead to a random uncontrolled eviction pattern.
> 
>
> Key: HIVE-21686
> URL: https://issues.apache.org/jira/browse/HIVE-21686
> Project: Hive
>  Issue Type: Bug
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>Priority: Major
> Attachments: Cache_hitrate_improvement.csv, HIVE-21686.2.patch, 
> HIVE-21686.patch
>
>
> Current logic used by brute force eviction can lead to a perpetual random 
> eviction pattern.
> For instance if the cache build a small pocket of free memory where the total 
> size is greater than incoming allocation request, the allocator will randomly 
> evict block that fits a particular size.
> This can happen over and over therefore all the eviction will be random.
> In Addition this random eviction will lead a leak in the linked list 
> maintained by the policy since it does not know anymore about what is evicted 
> and what not.
> The improvement of this patch is very substantial  to TPC-DS benchmark. I 
> have tested it with 10TB scale 9 llap nodes and 32GB cache size per node.  
> The patch has showed very noticeable difference in the Hit rate for raw 
> number  [^Cache_hitrate_improvement.csv] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21391) LLAP: Pool of column vector buffers can cause memory pressure

2019-05-21 Thread slim bouguerra (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845032#comment-16845032
 ] 

slim bouguerra commented on HIVE-21391:
---

[~prasanth_j] Took look at this and i think weak refs will not solve the issue 
of memory pressure.
In fact the root cause is the size of the Blocking queue between the IO-Thread 
(the producer) and Pipeline-Processor Thread (The consumer).
*Why ?*
please note that as of now the number of buffered CVBs is not related to size 
of CVB_Pool but the size of Blocking queue.
Currently no matter how big is your Pool size a fast IO will create as many CVB 
as possible to fill the Processor Queue.
Therefore at any point in time the size of CVB is == to the size of queue + 2 
(one been used by processor and second filled by IO-threads)
To conclude setting the size of the Pool to one or using weak ref will not 
change the equation because the Blocking queue need to hold a strong ref to the 
CVB.

*How to fix this ?*
To fix the issue will need to make the blocking queue size more realistic, 
* Currently the minimum queue size is 10 CVB, You can see that is can fire back 
if you scan 2000 Decimals column that is about 1GB per executor.
* Currently a Decimal is treated as 4 times (see 
LlapRecordReader#COL_WEIGHT_HIVEDECIMAL) Decimal64 which is wrong, As you can 
see Writable Deciaml is at least 10 times bigger
{code}
 /*
 org.apache.hadoop.hive.serde2.io.HiveDecimalWritable object internals:
 OFFSET SIZE TYPE DESCRIPTION VALUE
 0 16 (object header) N/A
 16 8 long FastHiveDecimal.fast2 N/A
 24 8 long FastHiveDecimal.fast1 N/A
 32 8 long FastHiveDecimal.fast0 N/A
 40 4 int FastHiveDecimal.fastSignum N/A
 44 4 int FastHiveDecimal.fastIntegerDigitCount N/A
 48 4 int FastHiveDecimal.fastScale N/A
 52 4 int FastHiveDecimal.fastSerializationScale N/A
 56 1 boolean HiveDecimalWritable.isSet N/A
 57 7 (alignment/padding gap)
 64 8 long[] HiveDecimalWritable.internalScratchLongs N/A
 72 8 byte[] HiveDecimalWritable.internalScratchBuffer N/A
 Instance size: 80 bytes
*/
{code}

* Also we might need to use pseudo-linear increase in function of the column 
number n, something like n log_n (or n sqrt_n) like that wide column scan gets 
less memory.
* To make the size more rational will try to make it relate to bytes instead of 
been unit-less. If you guys are okay with this, i propose the removal of 
LLAP_IO_VRB_QUEUE_LIMIT_MIN and replace it with desired size let say 1GB by 
default ? 

 

> LLAP: Pool of column vector buffers can cause memory pressure
> -
>
> Key: HIVE-21391
> URL: https://issues.apache.org/jira/browse/HIVE-21391
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 4.0.0, 3.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
> Attachments: HIVE-21391.1.patch
>
>
> Where there are too many columns (in the order of 100s), with decimal, string 
> types the column vector pool of buffers created here 
> [https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/io/decode/EncodedDataConsumer.java#L59]
>  can cause memory pressure. 
> Example:
> 128 (poolSize) * 300 (numCols) * 1024 (batchSize) * 80 (decimalSize) ~= 3GB
> The pool size keeps increasing when there is slow consumer but fast llap io 
> (SSDs) leading to GC pressure when all LLAP io threads read splits from same 
> table. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21686) Brute Force eviction can lead to a random uncontrolled eviction pattern.

2019-05-14 Thread slim bouguerra (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-21686:
--
Attachment: HIVE-21686.5.patch

> Brute Force eviction can lead to a random uncontrolled eviction pattern.
> 
>
> Key: HIVE-21686
> URL: https://issues.apache.org/jira/browse/HIVE-21686
> Project: Hive
>  Issue Type: Bug
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>Priority: Major
>  Labels: pull-request-available
> Attachments: Cache_hitrate_improvement.csv, HIVE-21686.2.patch, 
> HIVE-21686.3.patch, HIVE-21686.4.patch, HIVE-21686.5.patch, HIVE-21686.patch
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Current logic used by brute force eviction can lead to a perpetual random 
> eviction pattern.
> For instance if the cache build a small pocket of free memory where the total 
> size is greater than incoming allocation request, the allocator will randomly 
> evict block that fits a particular size.
> This can happen over and over therefore all the eviction will be random.
> In Addition this random eviction will lead a leak in the linked list 
> maintained by the policy since it does not know anymore about what is evicted 
> and what not.
> The improvement of this patch is very substantial  to TPC-DS benchmark. I 
> have tested it with 10TB scale 9 llap nodes and 32GB cache size per node.  
> The patch has showed very noticeable difference in the Hit rate for raw 
> number  [^Cache_hitrate_improvement.csv] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21665) Unable to reconstruct valid SQL query from AST when back ticks are used

2019-04-29 Thread slim bouguerra (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829831#comment-16829831
 ] 

slim bouguerra commented on HIVE-21665:
---

cc [~vgarg] / [~jdere]

> Unable to reconstruct valid SQL query from AST when back ticks are used
> ---
>
> Key: HIVE-21665
> URL: https://issues.apache.org/jira/browse/HIVE-21665
> Project: Hive
>  Issue Type: Bug
>Reporter: slim bouguerra
>Priority: Major
>
> Hive-6013 have introduced a parser rule that removes all the
> {code:java}
> `{code}
> from identifiers or query alias, this can result in some issue when we need 
> to reconstruct the actual SQL query from the AST.
> To reproduce the bug you can use explain analyze  statement as the following 
> query
> {code:java}
> explain analyze select 'literal' as `alias with space`;
> {code}
> This bugs will affect how Ranger plugin and probably result cache, where in 
> both places we need to reconstruct the query from AST.
>  The current work around is to avoid white spaces within Aliases. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21665) Unable to reconstruct valid SQL query from AST when back ticks are used

2019-04-29 Thread slim bouguerra (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-21665:
--
Affects Version/s: 4.0.0

> Unable to reconstruct valid SQL query from AST when back ticks are used
> ---
>
> Key: HIVE-21665
> URL: https://issues.apache.org/jira/browse/HIVE-21665
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: slim bouguerra
>Priority: Major
>
> Hive-6013 have introduced a parser rule that removes all the
> {code:java}
> `{code}
> from identifiers or query alias, this can result in some issue when we need 
> to reconstruct the actual SQL query from the AST.
> To reproduce the bug you can use explain analyze  statement as the following 
> query
> {code:java}
> explain analyze select 'literal' as `alias with space`;
> {code}
> This bugs will affect how Ranger plugin and probably result cache, where in 
> both places we need to reconstruct the query from AST.
>  The current work around is to avoid white spaces within Aliases. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21686) Brute Force eviction can lead to a random uncontrolled eviction pattern.

2019-05-07 Thread slim bouguerra (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-21686:
--
Attachment: HIVE-21686.2.patch

> Brute Force eviction can lead to a random uncontrolled eviction pattern.
> 
>
> Key: HIVE-21686
> URL: https://issues.apache.org/jira/browse/HIVE-21686
> Project: Hive
>  Issue Type: Bug
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>Priority: Major
> Attachments: HIVE-21686.2.patch, HIVE-21686.patch
>
>
> Current logic used by brute force eviction can lead to a perpetual random 
> eviction pattern.
> For instance if the cache build a small pocket of free memory where the total 
> size is greater than incoming allocation request, the allocator will randomly 
> evict block that fits a particular size.
> This can happen over and over therefore all the eviction will be random.
> In Addition this random eviction will lead a leak in the linked list 
> maintained by the policy since it does not know anymore about what is evicted 
> and what not.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21621) Update Kafka Clients to recent release 2.2.0

2019-04-18 Thread slim bouguerra (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-21621:
--
Attachment: HIVE-21621.4.patch

> Update Kafka Clients to recent release 2.2.0
> 
>
> Key: HIVE-21621
> URL: https://issues.apache.org/jira/browse/HIVE-21621
> Project: Hive
>  Issue Type: Task
>  Components: kafka integration
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>Priority: Trivial
>  Labels: pull-request-available
> Attachments: HIVE-21621.2.patch, HIVE-21621.3.patch, 
> HIVE-21621.3.patch, HIVE-21621.4.patch, HIVE-21621.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> all in the title update Kafka Storage Handler to the most recent clients 
> library.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21621) Update Kafka Clients to recent release 2.2.0

2019-04-18 Thread slim bouguerra (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-21621:
--
Attachment: HIVE-21621.5.patch

> Update Kafka Clients to recent release 2.2.0
> 
>
> Key: HIVE-21621
> URL: https://issues.apache.org/jira/browse/HIVE-21621
> Project: Hive
>  Issue Type: Task
>  Components: kafka integration
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>Priority: Trivial
>  Labels: pull-request-available
> Attachments: HIVE-21621.2.patch, HIVE-21621.3.patch, 
> HIVE-21621.3.patch, HIVE-21621.4.patch, HIVE-21621.5.patch, HIVE-21621.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> all in the title update Kafka Storage Handler to the most recent clients 
> library.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21240) JSON SerDe Re-Write

2019-04-19 Thread slim bouguerra (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16822250#comment-16822250
 ] 

slim bouguerra commented on HIVE-21240:
---

I have left some comments, can you please address those and re-update the Pull 
request and patch.

> JSON SerDe Re-Write
> ---
>
> Key: HIVE-21240
> URL: https://issues.apache.org/jira/browse/HIVE-21240
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 4.0.0, 3.1.1
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21240.1.patch, HIVE-21240.1.patch, 
> HIVE-21240.10.patch, HIVE-21240.11.patch, HIVE-21240.11.patch, 
> HIVE-21240.11.patch, HIVE-21240.11.patch, HIVE-21240.2.patch, 
> HIVE-21240.3.patch, HIVE-21240.4.patch, HIVE-21240.5.patch, 
> HIVE-21240.6.patch, HIVE-21240.7.patch, HIVE-21240.9.patch, 
> HIVE-24240.8.patch, kafka_storage_handler.diff
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The JSON SerDe has a few issues, I will link them to this JIRA.
> * Use Jackson Tree parser instead of manually parsing
> * Added support for base-64 encoded data (the expected format when using JSON)
> * Added support to skip blank lines (returns all columns as null values)
> * Current JSON parser accepts, but does not apply, custom timestamp formats 
> in most cases
> * Added some unit tests
> * Added cache for column-name to column-index searches, currently O\(n\) for 
> each row processed, for each column in the row



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-21689) Buddy Allocator memory accounting does not account for failed allocation attempts

2019-05-03 Thread slim bouguerra (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra reassigned HIVE-21689:
-


> Buddy Allocator memory accounting does not account for failed allocation 
> attempts
> -
>
> Key: HIVE-21689
> URL: https://issues.apache.org/jira/browse/HIVE-21689
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>Priority: Major
>
> Allocation method on Buddy Allocator, does not release the reserved memory in 
> case we failed to allocate the full sequence.
> Simple example:
> Assume We have an allocation request of 1kb.
> Will call reserver and reserve 1KB.
> Try to allocate will fail due to race condition.
> Try to discard will fail due to no space.
> At this point will exit without releasing the reserved memory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21689) Buddy Allocator memory accounting does not account for failed allocation attempts

2019-05-03 Thread slim bouguerra (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16832949#comment-16832949
 ] 

slim bouguerra commented on HIVE-21689:
---

UT can be found here.
https://gist.github.com/b-slim/f12aad2a2406d9ee15f338e0e760d3a5


> Buddy Allocator memory accounting does not account for failed allocation 
> attempts
> -
>
> Key: HIVE-21689
> URL: https://issues.apache.org/jira/browse/HIVE-21689
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>Priority: Major
>
> Allocation method on Buddy Allocator, does not release the reserved memory in 
> case we failed to allocate the full sequence.
> Simple example:
> Assume We have an allocation request of 1kb.
> Will call reserver and reserve 1KB.
> Try to allocate will fail due to race condition.
> Try to discard will fail due to no space.
> At this point will exit without releasing the reserved memory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-21686) Brute Force eviction can lead to a random uncontrolled eviction pattern.

2019-05-02 Thread slim bouguerra (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra reassigned HIVE-21686:
-


> Brute Force eviction can lead to a random uncontrolled eviction pattern.
> 
>
> Key: HIVE-21686
> URL: https://issues.apache.org/jira/browse/HIVE-21686
> Project: Hive
>  Issue Type: Bug
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>Priority: Major
>
> Current logic used by brute force eviction can lead to a perpetual random 
> eviction pattern.
> For instance if the cache build a small pocket of free memory where the total 
> size is greater than incoming allocation request, the allocator will randomly 
> evict block that fits a particular size.
> This can happen over and over therefore all the eviction will be random.
> In Addition this random eviction will lead a leak in the linked list 
> maintained by the policy since it does not know anymore about what is evicted 
> and what not.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21686) Brute Force eviction can lead to a random uncontrolled eviction pattern.

2019-05-02 Thread slim bouguerra (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-21686:
--
Attachment: HIVE-21686.patch

> Brute Force eviction can lead to a random uncontrolled eviction pattern.
> 
>
> Key: HIVE-21686
> URL: https://issues.apache.org/jira/browse/HIVE-21686
> Project: Hive
>  Issue Type: Bug
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>Priority: Major
> Attachments: HIVE-21686.patch
>
>
> Current logic used by brute force eviction can lead to a perpetual random 
> eviction pattern.
> For instance if the cache build a small pocket of free memory where the total 
> size is greater than incoming allocation request, the allocator will randomly 
> evict block that fits a particular size.
> This can happen over and over therefore all the eviction will be random.
> In Addition this random eviction will lead a leak in the linked list 
> maintained by the policy since it does not know anymore about what is evicted 
> and what not.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21686) Brute Force eviction can lead to a random uncontrolled eviction pattern.

2019-05-02 Thread slim bouguerra (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-21686:
--
Status: Patch Available  (was: Open)

> Brute Force eviction can lead to a random uncontrolled eviction pattern.
> 
>
> Key: HIVE-21686
> URL: https://issues.apache.org/jira/browse/HIVE-21686
> Project: Hive
>  Issue Type: Bug
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>Priority: Major
>
> Current logic used by brute force eviction can lead to a perpetual random 
> eviction pattern.
> For instance if the cache build a small pocket of free memory where the total 
> size is greater than incoming allocation request, the allocator will randomly 
> evict block that fits a particular size.
> This can happen over and over therefore all the eviction will be random.
> In Addition this random eviction will lead a leak in the linked list 
> maintained by the policy since it does not know anymore about what is evicted 
> and what not.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21934) Materialized view on top of Druid not pushing everything

2019-07-11 Thread slim bouguerra (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16883221#comment-16883221
 ] 

slim bouguerra commented on HIVE-21934:
---

+1

> Materialized view on top of Druid not pushing everything
> 
>
> Key: HIVE-21934
> URL: https://issues.apache.org/jira/browse/HIVE-21934
> Project: Hive
>  Issue Type: Improvement
>  Components: Druid integration, Materialized views
>Reporter: slim bouguerra
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21934.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The title is not very informative, but examples hopefully are.
> this is the plan with the view
> {code}
> explain SELECT MONTH(`dates_n1`.`__time`) AS `mn___time_ok`,
> CAST((MONTH(`dates_n1`.`__time`) - 1) / 3 + 1 AS BIGINT) AS `qr___time_ok`,
> SUM(1) AS `sum_number_of_records_ok`,
> YEAR(`dates_n1`.`__time`) AS `yr___time_ok`
> FROM `mv_ssb_100_scale`.`lineorder_n0` `lineorder_n0`
> JOIN `mv_ssb_100_scale`.`dates_n1` `dates_n1` ON 
> (`lineorder_n0`.`lo_orderdate` = `dates_n1`.`d_datekey`)
> JOIN `mv_ssb_100_scale`.`customer_n1` `customer_n1` ON 
> (`lineorder_n0`.`lo_custkey` = `customer_n1`.`c_custkey`)
> JOIN `mv_ssb_100_scale`.`supplier_n0` `supplier_n0` ON 
> (`lineorder_n0`.`lo_suppkey` = `supplier_n0`.`s_suppkey`)
> JOIN `mv_ssb_100_scale`.`ssb_part_n0` `ssb_part_n0` ON 
> (`lineorder_n0`.`lo_partkey` = `ssb_part_n0`.`p_partkey`)
> GROUP BY MONTH(`dates_n1`.`__time`),
> CAST((MONTH(`dates_n1`.`__time`) - 1) / 3 + 1 AS BIGINT),
> YEAR(`dates_n1`.`__time`)
> INFO : Starting task [Stage-3:EXPLAIN] in serial mode
> INFO : Completed executing 
> command(queryId=sbouguerra_20190627113101_1493ee87-0288-4e30-b53c-0ee729ce3977);
>  Time taken: 0.005 seconds
> INFO : OK
> ++
> | Explain |
> ++
> | Plan optimized by CBO. |
> | |
> | Vertex dependency in root stage |
> | Reducer 2 <- Map 1 (SIMPLE_EDGE) |
> | |
> | Stage-0 |
> | Fetch Operator |
> | limit:-1 |
> | Stage-1 |
> | Reducer 2 vectorized, llap |
> | File Output Operator [FS_13] |
> | Select Operator [SEL_12] (rows=300018951 width=38) |
> | Output:["_col0","_col1","_col2","_col3"] |
> | Group By Operator [GBY_11] (rows=300018951 width=38) |
> | 
> Output:["_col0","_col1","_col2","_col3"],aggregations:["sum(VALUE._col0)"],keys:KEY._col0,
>  KEY._col1, KEY._col2 |
> | <-Map 1 [SIMPLE_EDGE] vectorized, llap |
> | SHUFFLE [RS_10] |
> | PartitionCols:_col0, _col1, _col2 |
> | Group By Operator [GBY_9] (rows=600037902 width=38) |
> | 
> Output:["_col0","_col1","_col2","_col3"],aggregations:["sum(1)"],keys:_col0, 
> _col1, _col2 |
> | Select Operator [SEL_8] (rows=600037902 width=38) |
> | Output:["_col0","_col1","_col2"] |
> | TableScan [TS_0] (rows=600037902 width=38) |
> | 
> mv_ssb_100_scale@ssb_mv_druid_100,ssb_mv_druid_100,Tbl:COMPLETE,Col:NONE,Output:["vc"],properties:\{"druid.fieldNames":"vc","druid.fieldTypes":"timestamp","druid.query.json":"{\"queryType\":\"scan\",\"dataSource\":\"mv_ssb_100_scale.ssb_mv_druid_100\",\"intervals\":[\"1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z\"],\"virtualColumns\":[{\"type\":\"expression\",\"name\":\"vc\",\"expression\":\"\\\"__time\\\"\",\"outputType\":\"LONG\"}],\"columns\":[\"vc\"],\"resultFormat\":\"compactedList\"}","druid.query.type":"scan"}
>  |
> | |
> ++
>  
> {code}
> if i use a simple druid table without MV 
> {code}
> explain SELECT MONTH(`__time`) AS `mn___time_ok`,
> CAST((MONTH(`__time`) - 1) / 3 + 1 AS BIGINT) AS `qr___time_ok`,
> SUM(1) AS `sum_number_of_records_ok`,
> YEAR(`__time`) AS `yr___time_ok`
> FROM `druid_ssb.ssb_druid_100`
> GROUP BY MONTH(`__time`),
> CAST((MONTH(`__time`) - 1) / 3 + 1 AS BIGINT),
> YEAR(`__time`);
> {code}
> {code}
> ++
> | Explain |
> ++
> | Plan optimized by CBO. |
> | |
> | Stage-0 |
> | Fetch Operator |
> | limit:-1 |
> | Select Operator [SEL_1] |
> | Output:["_col0","_col1","_col2","_col3"] |
> | TableScan [TS_0] |
> | 
> 

[jira] [Updated] (HIVE-21621) Update Kafka Clients to recent release 2.2.0

2019-04-25 Thread slim bouguerra (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-21621:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Update Kafka Clients to recent release 2.2.0
> 
>
> Key: HIVE-21621
> URL: https://issues.apache.org/jira/browse/HIVE-21621
> Project: Hive
>  Issue Type: Task
>  Components: kafka integration
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>Priority: Trivial
>  Labels: pull-request-available
> Attachments: HIVE-21621.2.patch, HIVE-21621.3.patch, 
> HIVE-21621.3.patch, HIVE-21621.4.patch, HIVE-21621.5.patch, HIVE-21621.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> all in the title update Kafka Storage Handler to the most recent clients 
> library.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21621) Update Kafka Clients to recent release 2.2.0

2019-04-17 Thread slim bouguerra (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-21621:
--
Attachment: HIVE-21621.3.patch

> Update Kafka Clients to recent release 2.2.0
> 
>
> Key: HIVE-21621
> URL: https://issues.apache.org/jira/browse/HIVE-21621
> Project: Hive
>  Issue Type: Task
>  Components: kafka integration
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>Priority: Trivial
>  Labels: pull-request-available
> Attachments: HIVE-21621.2.patch, HIVE-21621.3.patch, HIVE-21621.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> all in the title update Kafka Storage Handler to the most recent clients 
> library.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21570) Convert llap iomem servlets output to json format

2019-04-22 Thread slim bouguerra (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16823144#comment-16823144
 ] 

slim bouguerra commented on HIVE-21570:
---

[~asinkovits] Thanks for doing this and sorry for late response. Can you please 
submit a Pull request where i can add comments.
 Here is 3 high level comments
 * There is no need to wrap numeric values with StringValueOf, the json writer 
can handle that plus it keep the typing correct.
 * Lot of the HashMap creation can be avoided by using EnumMaps (maybe) since 
we know all the keys upfront ?
 * Missing some documentation about the query parameter and probably some 
testing can you please provide examples and how might be used ?

Thanks

> Convert llap iomem servlets output to json format
> -
>
> Key: HIVE-21570
> URL: https://issues.apache.org/jira/browse/HIVE-21570
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Minor
> Attachments: HIVE-21570.01.patch, HIVE-21570.02.patch, 
> HIVE-21570.03.patch, HIVE-21570.04.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21934) Materialized view on top of Druid not pushing everything

2019-06-28 Thread slim bouguerra (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-21934:
--
Summary: Materialized view on top of Druid not pushing everything  (was: 
Materialized view on top of Druid not pushing every thing)

> Materialized view on top of Druid not pushing everything
> 
>
> Key: HIVE-21934
> URL: https://issues.apache.org/jira/browse/HIVE-21934
> Project: Hive
>  Issue Type: Improvement
>Reporter: slim bouguerra
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>
> The title is not very informative, but examples hopefully are.
> this is the plan with the view
> {code}
> explain SELECT MONTH(`dates_n1`.`__time`) AS `mn___time_ok`,
> CAST((MONTH(`dates_n1`.`__time`) - 1) / 3 + 1 AS BIGINT) AS `qr___time_ok`,
> SUM(1) AS `sum_number_of_records_ok`,
> YEAR(`dates_n1`.`__time`) AS `yr___time_ok`
> FROM `mv_ssb_100_scale`.`lineorder_n0` `lineorder_n0`
> JOIN `mv_ssb_100_scale`.`dates_n1` `dates_n1` ON 
> (`lineorder_n0`.`lo_orderdate` = `dates_n1`.`d_datekey`)
> JOIN `mv_ssb_100_scale`.`customer_n1` `customer_n1` ON 
> (`lineorder_n0`.`lo_custkey` = `customer_n1`.`c_custkey`)
> JOIN `mv_ssb_100_scale`.`supplier_n0` `supplier_n0` ON 
> (`lineorder_n0`.`lo_suppkey` = `supplier_n0`.`s_suppkey`)
> JOIN `mv_ssb_100_scale`.`ssb_part_n0` `ssb_part_n0` ON 
> (`lineorder_n0`.`lo_partkey` = `ssb_part_n0`.`p_partkey`)
> GROUP BY MONTH(`dates_n1`.`__time`),
> CAST((MONTH(`dates_n1`.`__time`) - 1) / 3 + 1 AS BIGINT),
> YEAR(`dates_n1`.`__time`)
> INFO : Starting task [Stage-3:EXPLAIN] in serial mode
> INFO : Completed executing 
> command(queryId=sbouguerra_20190627113101_1493ee87-0288-4e30-b53c-0ee729ce3977);
>  Time taken: 0.005 seconds
> INFO : OK
> ++
> | Explain |
> ++
> | Plan optimized by CBO. |
> | |
> | Vertex dependency in root stage |
> | Reducer 2 <- Map 1 (SIMPLE_EDGE) |
> | |
> | Stage-0 |
> | Fetch Operator |
> | limit:-1 |
> | Stage-1 |
> | Reducer 2 vectorized, llap |
> | File Output Operator [FS_13] |
> | Select Operator [SEL_12] (rows=300018951 width=38) |
> | Output:["_col0","_col1","_col2","_col3"] |
> | Group By Operator [GBY_11] (rows=300018951 width=38) |
> | 
> Output:["_col0","_col1","_col2","_col3"],aggregations:["sum(VALUE._col0)"],keys:KEY._col0,
>  KEY._col1, KEY._col2 |
> | <-Map 1 [SIMPLE_EDGE] vectorized, llap |
> | SHUFFLE [RS_10] |
> | PartitionCols:_col0, _col1, _col2 |
> | Group By Operator [GBY_9] (rows=600037902 width=38) |
> | 
> Output:["_col0","_col1","_col2","_col3"],aggregations:["sum(1)"],keys:_col0, 
> _col1, _col2 |
> | Select Operator [SEL_8] (rows=600037902 width=38) |
> | Output:["_col0","_col1","_col2"] |
> | TableScan [TS_0] (rows=600037902 width=38) |
> | 
> mv_ssb_100_scale@ssb_mv_druid_100,ssb_mv_druid_100,Tbl:COMPLETE,Col:NONE,Output:["vc"],properties:\{"druid.fieldNames":"vc","druid.fieldTypes":"timestamp","druid.query.json":"{\"queryType\":\"scan\",\"dataSource\":\"mv_ssb_100_scale.ssb_mv_druid_100\",\"intervals\":[\"1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z\"],\"virtualColumns\":[{\"type\":\"expression\",\"name\":\"vc\",\"expression\":\"\\\"__time\\\"\",\"outputType\":\"LONG\"}],\"columns\":[\"vc\"],\"resultFormat\":\"compactedList\"}","druid.query.type":"scan"}
>  |
> | |
> ++
>  
> {code}
> if i use a simple druid table without MV 
> {code}
> explain SELECT MONTH(`__time`) AS `mn___time_ok`,
> CAST((MONTH(`__time`) - 1) / 3 + 1 AS BIGINT) AS `qr___time_ok`,
> SUM(1) AS `sum_number_of_records_ok`,
> YEAR(`__time`) AS `yr___time_ok`
> FROM `druid_ssb.ssb_druid_100`
> GROUP BY MONTH(`__time`),
> CAST((MONTH(`__time`) - 1) / 3 + 1 AS BIGINT),
> YEAR(`__time`);
> {code}
> {code}
> ++
> | Explain |
> ++
> | Plan optimized by CBO. |
> | |
> | Stage-0 |
> | Fetch Operator |
> | limit:-1 |
> | Select Operator [SEL_1] |
> | Output:["_col0","_col1","_col2","_col3"] |
> | TableScan [TS_0] |
> | 
> 

[jira] [Commented] (HIVE-21934) Materialized view on top of Druid not pushing every thing

2019-06-28 Thread slim bouguerra (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875141#comment-16875141
 ] 

slim bouguerra commented on HIVE-21934:
---

source tables

{code}

CREATE TABLE `lineorder_n0`(
 `lo_orderkey` bigint, 
 `lo_linenumber` int, 
 `lo_custkey` bigint not null disable rely,
 `lo_partkey` bigint not null disable rely,
 `lo_suppkey` bigint not null disable rely,
 `lo_orderdate` bigint not null disable rely,
 `lo_ordpriority` string, 
 `lo_shippriority` string, 
 `lo_quantity` double, 
 `lo_extendedprice` double, 
 `lo_ordtotalprice` double, 
 `lo_discount` double, 
 `lo_revenue` double, 
 `lo_supplycost` double, 
 `lo_tax` double, 
 `lo_commitdate` bigint, 
 `lo_shipmode` string,
 primary key (`lo_orderkey`) disable rely,
 constraint fk21 foreign key (`lo_custkey`) references 
`customer_n1`(`c_custkey`) disable rely,
 constraint fk22 foreign key (`lo_orderdate`) references 
`dates_n1`(`d_datekey`) disable rely,
 constraint fk23 foreign key (`lo_partkey`) references 
`ssb_part_n0`(`p_partkey`) disable rely,
 constraint fk24 foreign key (`lo_suppkey`) references 
`supplier_n0`(`s_suppkey`) disable rely)
STORED AS ORC
TBLPROPERTIES ('transactional'='true');

{code}

 

{code}

CREATE TABLE `dates_n1`(
 `d_datekey` bigint, 
 `__time` timestamp,
 `d_date` string, 
 `d_dayofweek` string, 
 `d_month` string, 
 `d_year` int, 
 `d_yearmonthnum` int, 
 `d_yearmonth` string, 
 `d_daynuminweek` int,
 `d_daynuminmonth` int,
 `d_daynuminyear` int,
 `d_monthnuminyear` int,
 `d_weeknuminyear` int,
 `d_sellingseason` string,
 `d_lastdayinweekfl` int,
 `d_lastdayinmonthfl` int,
 `d_holidayfl` int ,
 `d_weekdayfl`int,
 primary key (`d_datekey`) disable rely
)
STORED AS ORC
TBLPROPERTIES ('transactional'='true');

{code}

> Materialized view on top of Druid not pushing every thing
> -
>
> Key: HIVE-21934
> URL: https://issues.apache.org/jira/browse/HIVE-21934
> Project: Hive
>  Issue Type: Improvement
>Reporter: slim bouguerra
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>
> The title is not very informative, but examples hopefully are.
> this is the plan with the view
> {code}
> explain SELECT MONTH(`dates_n1`.`__time`) AS `mn___time_ok`,
> CAST((MONTH(`dates_n1`.`__time`) - 1) / 3 + 1 AS BIGINT) AS `qr___time_ok`,
> SUM(1) AS `sum_number_of_records_ok`,
> YEAR(`dates_n1`.`__time`) AS `yr___time_ok`
> FROM `mv_ssb_100_scale`.`lineorder_n0` `lineorder_n0`
> JOIN `mv_ssb_100_scale`.`dates_n1` `dates_n1` ON 
> (`lineorder_n0`.`lo_orderdate` = `dates_n1`.`d_datekey`)
> JOIN `mv_ssb_100_scale`.`customer_n1` `customer_n1` ON 
> (`lineorder_n0`.`lo_custkey` = `customer_n1`.`c_custkey`)
> JOIN `mv_ssb_100_scale`.`supplier_n0` `supplier_n0` ON 
> (`lineorder_n0`.`lo_suppkey` = `supplier_n0`.`s_suppkey`)
> JOIN `mv_ssb_100_scale`.`ssb_part_n0` `ssb_part_n0` ON 
> (`lineorder_n0`.`lo_partkey` = `ssb_part_n0`.`p_partkey`)
> GROUP BY MONTH(`dates_n1`.`__time`),
> CAST((MONTH(`dates_n1`.`__time`) - 1) / 3 + 1 AS BIGINT),
> YEAR(`dates_n1`.`__time`)
> INFO : Starting task [Stage-3:EXPLAIN] in serial mode
> INFO : Completed executing 
> command(queryId=sbouguerra_20190627113101_1493ee87-0288-4e30-b53c-0ee729ce3977);
>  Time taken: 0.005 seconds
> INFO : OK
> ++
> | Explain |
> ++
> | Plan optimized by CBO. |
> | |
> | Vertex dependency in root stage |
> | Reducer 2 <- Map 1 (SIMPLE_EDGE) |
> | |
> | Stage-0 |
> | Fetch Operator |
> | limit:-1 |
> | Stage-1 |
> | Reducer 2 vectorized, llap |
> | File Output Operator [FS_13] |
> | Select Operator [SEL_12] (rows=300018951 width=38) |
> | Output:["_col0","_col1","_col2","_col3"] |
> | Group By Operator [GBY_11] (rows=300018951 width=38) |
> | 
> Output:["_col0","_col1","_col2","_col3"],aggregations:["sum(VALUE._col0)"],keys:KEY._col0,
>  KEY._col1, KEY._col2 |
> | <-Map 1 [SIMPLE_EDGE] vectorized, llap |
> | SHUFFLE [RS_10] |
> | PartitionCols:_col0, _col1, _col2 |
> | Group By Operator [GBY_9] (rows=600037902 width=38) |
> | 
> Output:["_col0","_col1","_col2","_col3"],aggregations:["sum(1)"],keys:_col0, 
> _col1, _col2 |
> | Select Operator [SEL_8] (rows=600037902 width=38) |
> | Output:["_col0","_col1","_col2"] |
> | TableScan [TS_0] (rows=600037902 width=38) |
> | 
> 

[jira] [Assigned] (HIVE-21934) Materialized view on top of Druid not pushing every thing

2019-06-28 Thread slim bouguerra (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra reassigned HIVE-21934:
-


> Materialized view on top of Druid not pushing every thing
> -
>
> Key: HIVE-21934
> URL: https://issues.apache.org/jira/browse/HIVE-21934
> Project: Hive
>  Issue Type: Improvement
>Reporter: slim bouguerra
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>
> The title is not very informative, but examples hopefully are.
> this is the plan with the view
> {code}
> explain SELECT MONTH(`dates_n1`.`__time`) AS `mn___time_ok`,
> CAST((MONTH(`dates_n1`.`__time`) - 1) / 3 + 1 AS BIGINT) AS `qr___time_ok`,
> SUM(1) AS `sum_number_of_records_ok`,
> YEAR(`dates_n1`.`__time`) AS `yr___time_ok`
> FROM `mv_ssb_100_scale`.`lineorder_n0` `lineorder_n0`
> JOIN `mv_ssb_100_scale`.`dates_n1` `dates_n1` ON 
> (`lineorder_n0`.`lo_orderdate` = `dates_n1`.`d_datekey`)
> JOIN `mv_ssb_100_scale`.`customer_n1` `customer_n1` ON 
> (`lineorder_n0`.`lo_custkey` = `customer_n1`.`c_custkey`)
> JOIN `mv_ssb_100_scale`.`supplier_n0` `supplier_n0` ON 
> (`lineorder_n0`.`lo_suppkey` = `supplier_n0`.`s_suppkey`)
> JOIN `mv_ssb_100_scale`.`ssb_part_n0` `ssb_part_n0` ON 
> (`lineorder_n0`.`lo_partkey` = `ssb_part_n0`.`p_partkey`)
> GROUP BY MONTH(`dates_n1`.`__time`),
> CAST((MONTH(`dates_n1`.`__time`) - 1) / 3 + 1 AS BIGINT),
> YEAR(`dates_n1`.`__time`)
> INFO : Starting task [Stage-3:EXPLAIN] in serial mode
> INFO : Completed executing 
> command(queryId=sbouguerra_20190627113101_1493ee87-0288-4e30-b53c-0ee729ce3977);
>  Time taken: 0.005 seconds
> INFO : OK
> ++
> | Explain |
> ++
> | Plan optimized by CBO. |
> | |
> | Vertex dependency in root stage |
> | Reducer 2 <- Map 1 (SIMPLE_EDGE) |
> | |
> | Stage-0 |
> | Fetch Operator |
> | limit:-1 |
> | Stage-1 |
> | Reducer 2 vectorized, llap |
> | File Output Operator [FS_13] |
> | Select Operator [SEL_12] (rows=300018951 width=38) |
> | Output:["_col0","_col1","_col2","_col3"] |
> | Group By Operator [GBY_11] (rows=300018951 width=38) |
> | 
> Output:["_col0","_col1","_col2","_col3"],aggregations:["sum(VALUE._col0)"],keys:KEY._col0,
>  KEY._col1, KEY._col2 |
> | <-Map 1 [SIMPLE_EDGE] vectorized, llap |
> | SHUFFLE [RS_10] |
> | PartitionCols:_col0, _col1, _col2 |
> | Group By Operator [GBY_9] (rows=600037902 width=38) |
> | 
> Output:["_col0","_col1","_col2","_col3"],aggregations:["sum(1)"],keys:_col0, 
> _col1, _col2 |
> | Select Operator [SEL_8] (rows=600037902 width=38) |
> | Output:["_col0","_col1","_col2"] |
> | TableScan [TS_0] (rows=600037902 width=38) |
> | 
> mv_ssb_100_scale@ssb_mv_druid_100,ssb_mv_druid_100,Tbl:COMPLETE,Col:NONE,Output:["vc"],properties:\{"druid.fieldNames":"vc","druid.fieldTypes":"timestamp","druid.query.json":"{\"queryType\":\"scan\",\"dataSource\":\"mv_ssb_100_scale.ssb_mv_druid_100\",\"intervals\":[\"1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z\"],\"virtualColumns\":[{\"type\":\"expression\",\"name\":\"vc\",\"expression\":\"\\\"__time\\\"\",\"outputType\":\"LONG\"}],\"columns\":[\"vc\"],\"resultFormat\":\"compactedList\"}","druid.query.type":"scan"}
>  |
> | |
> ++
>  
> {code}
> if i use a simple druid table without MV 
> {code}
> explain SELECT MONTH(`__time`) AS `mn___time_ok`,
> CAST((MONTH(`__time`) - 1) / 3 + 1 AS BIGINT) AS `qr___time_ok`,
> SUM(1) AS `sum_number_of_records_ok`,
> YEAR(`__time`) AS `yr___time_ok`
> FROM `druid_ssb.ssb_druid_100`
> GROUP BY MONTH(`__time`),
> CAST((MONTH(`__time`) - 1) / 3 + 1 AS BIGINT),
> YEAR(`__time`);
> {code}
> {code}
> ++
> | Explain |
> ++
> | Plan optimized by CBO. |
> | |
> | Stage-0 |
> | Fetch Operator |
> | limit:-1 |
> | Select Operator [SEL_1] |
> | Output:["_col0","_col1","_col2","_col3"] |
> | TableScan [TS_0] |
> | 
> 

[jira] [Commented] (HIVE-21934) Materialized view on top of Druid not pushing every thing

2019-06-28 Thread slim bouguerra (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875140#comment-16875140
 ] 

slim bouguerra commented on HIVE-21934:
---

view definition

{code}


CREATE MATERIALIZED VIEW `ssb_mv_druid_100`
STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
TBLPROPERTIES (
 "druid.segment.granularity" = "MONTH",
 "druid.query.granularity" = "DAY")
AS
SELECT
 `__time` as `__time` ,
 cast(c_city as string) c_city,
 cast(c_nation as string) c_nation,
 cast(c_region as string) c_region,
 c_mktsegment as c_mktsegment,
 cast(d_weeknuminyear as string) d_weeknuminyear,
 cast(d_year as string) d_year,
 cast(d_yearmonth as string) d_yearmonth,
 cast(d_yearmonthnum as string) d_yearmonthnum,
 cast(p_brand1 as string) p_brand1,
 cast(p_category as string) p_category,
 cast(p_mfgr as string) p_mfgr,
 p_type,
 s_name,
 cast(s_city as string) s_city,
 cast(s_nation as string) s_nation,
 cast(s_region as string) s_region,
 cast(`lo_ordpriority` as string) lo_ordpriority, 
 cast(`lo_shippriority` as string) lo_shippriority, 
 `d_sellingseason`
 `lo_shipmode`, 
 lo_revenue,
 lo_supplycost ,
 lo_discount ,
 `lo_quantity`, 
 `lo_extendedprice`, 
 `lo_ordtotalprice`, 
 lo_extendedprice * lo_discount discounted_price,
 lo_revenue - lo_supplycost net_revenue
FROM
 customer_n1, dates_n1, lineorder_n1, ssb_part_n0, supplier_n0
where
 lo_orderdate = d_datekey
 and lo_partkey = p_partkey
 and lo_suppkey = s_suppkey
 and lo_custkey = c_custkey;

{code}

> Materialized view on top of Druid not pushing every thing
> -
>
> Key: HIVE-21934
> URL: https://issues.apache.org/jira/browse/HIVE-21934
> Project: Hive
>  Issue Type: Improvement
>Reporter: slim bouguerra
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>
> The title is not very informative, but examples hopefully are.
> this is the plan with the view
> {code}
> explain SELECT MONTH(`dates_n1`.`__time`) AS `mn___time_ok`,
> CAST((MONTH(`dates_n1`.`__time`) - 1) / 3 + 1 AS BIGINT) AS `qr___time_ok`,
> SUM(1) AS `sum_number_of_records_ok`,
> YEAR(`dates_n1`.`__time`) AS `yr___time_ok`
> FROM `mv_ssb_100_scale`.`lineorder_n0` `lineorder_n0`
> JOIN `mv_ssb_100_scale`.`dates_n1` `dates_n1` ON 
> (`lineorder_n0`.`lo_orderdate` = `dates_n1`.`d_datekey`)
> JOIN `mv_ssb_100_scale`.`customer_n1` `customer_n1` ON 
> (`lineorder_n0`.`lo_custkey` = `customer_n1`.`c_custkey`)
> JOIN `mv_ssb_100_scale`.`supplier_n0` `supplier_n0` ON 
> (`lineorder_n0`.`lo_suppkey` = `supplier_n0`.`s_suppkey`)
> JOIN `mv_ssb_100_scale`.`ssb_part_n0` `ssb_part_n0` ON 
> (`lineorder_n0`.`lo_partkey` = `ssb_part_n0`.`p_partkey`)
> GROUP BY MONTH(`dates_n1`.`__time`),
> CAST((MONTH(`dates_n1`.`__time`) - 1) / 3 + 1 AS BIGINT),
> YEAR(`dates_n1`.`__time`)
> INFO : Starting task [Stage-3:EXPLAIN] in serial mode
> INFO : Completed executing 
> command(queryId=sbouguerra_20190627113101_1493ee87-0288-4e30-b53c-0ee729ce3977);
>  Time taken: 0.005 seconds
> INFO : OK
> ++
> | Explain |
> ++
> | Plan optimized by CBO. |
> | |
> | Vertex dependency in root stage |
> | Reducer 2 <- Map 1 (SIMPLE_EDGE) |
> | |
> | Stage-0 |
> | Fetch Operator |
> | limit:-1 |
> | Stage-1 |
> | Reducer 2 vectorized, llap |
> | File Output Operator [FS_13] |
> | Select Operator [SEL_12] (rows=300018951 width=38) |
> | Output:["_col0","_col1","_col2","_col3"] |
> | Group By Operator [GBY_11] (rows=300018951 width=38) |
> | 
> Output:["_col0","_col1","_col2","_col3"],aggregations:["sum(VALUE._col0)"],keys:KEY._col0,
>  KEY._col1, KEY._col2 |
> | <-Map 1 [SIMPLE_EDGE] vectorized, llap |
> | SHUFFLE [RS_10] |
> | PartitionCols:_col0, _col1, _col2 |
> | Group By Operator [GBY_9] (rows=600037902 width=38) |
> | 
> Output:["_col0","_col1","_col2","_col3"],aggregations:["sum(1)"],keys:_col0, 
> _col1, _col2 |
> | Select Operator [SEL_8] (rows=600037902 width=38) |
> | Output:["_col0","_col1","_col2"] |
> | TableScan [TS_0] (rows=600037902 width=38) |
> | 
> mv_ssb_100_scale@ssb_mv_druid_100,ssb_mv_druid_100,Tbl:COMPLETE,Col:NONE,Output:["vc"],properties:\{"druid.fieldNames":"vc","druid.fieldTypes":"timestamp","druid.query.json":"{\"queryType\":\"scan\",\"dataSource\":\"mv_ssb_100_scale.ssb_mv_druid_100\",\"intervals\":[\"1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z\"],\"virtualColumns\":[{\"type\":\"expression\",\"name\":\"vc\",\"expression\":\"\\\"__time\\\"\",\"outputType\":\"LONG\"}],\"columns\":[\"vc\"],\"resultFormat\":\"compactedList\"}","druid.query.type":"scan"}
>  |
> | |
> ++
>  
> {code}
> if i use a simple druid table without MV 
> {code}
> explain SELECT MONTH(`__time`) AS `mn___time_ok`,
> CAST((MONTH(`__time`) - 1) / 3 + 1 

[jira] [Assigned] (HIVE-22115) Prevent the creation of query-router logger in HS2 as per property

2019-08-14 Thread slim bouguerra (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-22115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra reassigned HIVE-22115:
-


> Prevent the creation of query-router logger in HS2 as per property
> --
>
> Key: HIVE-22115
> URL: https://issues.apache.org/jira/browse/HIVE-22115
> Project: Hive
>  Issue Type: Improvement
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>Priority: Major
>
> Avoid the creation and registration of query-router logger if the Hive server 
> Property is set to false by the user
> {code}
> HiveConf.ConfVars.HIVE_SERVER2_LOGGING_OPERATION_ENABLED
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (HIVE-22115) Prevent the creation of query-router logger in HS2 as per property

2019-08-14 Thread slim bouguerra (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-22115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-22115:
--
Attachment: HIVE-22115.patch

> Prevent the creation of query-router logger in HS2 as per property
> --
>
> Key: HIVE-22115
> URL: https://issues.apache.org/jira/browse/HIVE-22115
> Project: Hive
>  Issue Type: Improvement
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>Priority: Major
> Attachments: HIVE-22115.patch
>
>
> Avoid the creation and registration of query-router logger if the Hive server 
> Property is set to false by the user
> {code}
> HiveConf.ConfVars.HIVE_SERVER2_LOGGING_OPERATION_ENABLED
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (HIVE-22115) Prevent the creation of query-router logger in HS2 as per property

2019-08-14 Thread slim bouguerra (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-22115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-22115:
--
Status: Patch Available  (was: In Progress)

> Prevent the creation of query-router logger in HS2 as per property
> --
>
> Key: HIVE-22115
> URL: https://issues.apache.org/jira/browse/HIVE-22115
> Project: Hive
>  Issue Type: Improvement
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>Priority: Major
>
> Avoid the creation and registration of query-router logger if the Hive server 
> Property is set to false by the user
> {code}
> HiveConf.ConfVars.HIVE_SERVER2_LOGGING_OPERATION_ENABLED
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work started] (HIVE-22115) Prevent the creation of query-router logger in HS2 as per property

2019-08-14 Thread slim bouguerra (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-22115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-22115 started by slim bouguerra.
-
> Prevent the creation of query-router logger in HS2 as per property
> --
>
> Key: HIVE-22115
> URL: https://issues.apache.org/jira/browse/HIVE-22115
> Project: Hive
>  Issue Type: Improvement
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>Priority: Major
>
> Avoid the creation and registration of query-router logger if the Hive server 
> Property is set to false by the user
> {code}
> HiveConf.ConfVars.HIVE_SERVER2_LOGGING_OPERATION_ENABLED
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Assigned] (HIVE-22125) Move to Kafka 2.3 Clients

2019-08-16 Thread slim bouguerra (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-22125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra reassigned HIVE-22125:
-


> Move to Kafka 2.3 Clients
> -
>
> Key: HIVE-22125
> URL: https://issues.apache.org/jira/browse/HIVE-22125
> Project: Hive
>  Issue Type: Improvement
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (HIVE-22115) Prevent the creation of query-router logger in HS2 as per property

2019-08-15 Thread slim bouguerra (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-22115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-22115:
--
Attachment: HIVE-22115.patch

> Prevent the creation of query-router logger in HS2 as per property
> --
>
> Key: HIVE-22115
> URL: https://issues.apache.org/jira/browse/HIVE-22115
> Project: Hive
>  Issue Type: Improvement
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>Priority: Major
> Attachments: HIVE-22115.patch, HIVE-22115.patch
>
>
> Avoid the creation and registration of query-router logger if the Hive server 
> Property is set to false by the user
> {code}
> HiveConf.ConfVars.HIVE_SERVER2_LOGGING_OPERATION_ENABLED
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (HIVE-22115) Prevent the creation of query-router logger in HS2 as per property

2019-08-15 Thread slim bouguerra (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-22115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-22115:
--
Attachment: HIVE-22115.patch

> Prevent the creation of query-router logger in HS2 as per property
> --
>
> Key: HIVE-22115
> URL: https://issues.apache.org/jira/browse/HIVE-22115
> Project: Hive
>  Issue Type: Improvement
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>Priority: Major
> Attachments: HIVE-22115.patch, HIVE-22115.patch, HIVE-22115.patch
>
>
> Avoid the creation and registration of query-router logger if the Hive server 
> Property is set to false by the user
> {code}
> HiveConf.ConfVars.HIVE_SERVER2_LOGGING_OPERATION_ENABLED
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (HIVE-22113) Prevent LLAP shutdown on AMReporter related RuntimeException

2019-08-15 Thread slim bouguerra (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-22113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-22113:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thanks Oli 
[https://git-wip-us.apache.org/repos/asf?p=hive.git;a=commit;h=bd42f23d49d9948f690a14675d6e77830adddfef]

> Prevent LLAP shutdown on AMReporter related RuntimeException
> 
>
> Key: HIVE-22113
> URL: https://issues.apache.org/jira/browse/HIVE-22113
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 3.1.1
>Reporter: Oliver Draese
>Assignee: Oliver Draese
>Priority: Major
>  Labels: llap
> Attachments: HIVE-22113.1.patch, HIVE-22113.2.patch, HIVE-22113.patch
>
>
> If a task attempt cannot be removed from AMReporter (i.e. task attempt was 
> not found), the AMReporter throws a RuntimeException. This exception is not 
> caught and trickles up, causing an LLAP shutdown:
> {{2019-08-08T23:34:39,748[Wait-Queue-Scheduler-0()]:[Wait-Queue-Scheduler-0,5,main]}}{{java.lang.RuntimeException:_1563528877295_18872_3728_01_03_0't}}{{
> 
> at$AMNodeInfo.removeTaskAttempt(AMReporter.java:524)~[hive-llap-server-3.1.0.3.1.0.103-1.jar:3.1.0.3.1.0.103-1]}}{{
> 
> at(AMReporter.java:243)~[hive-llap-server-3.1.0.3.1.0.103-1.jar:3.1.0.3.1.0.103-1]}}{{
> 
> at(TaskRunnerCallable.java:384)~[hive-llap-server-3.1.0.3.1.0.103-1.jar:3.1.0.3.1.0.103-1]}}{{
> 
> at(TaskExecutorService.java:739)~[hive-llap-server-3.1.0.3.1.0.103-1.jar:3.1.0.3.1.0.103-1]}}{{
> 
> at$1100(TaskExecutorService.java:91)~[hive-llap-server-3.1.0.3.1.0.103-1.jar:3.1.0.3.1.0.103-1]}}{{
> 
> at$WaitQueueWorker.run(TaskExecutorService.java:396)~[hive-llap-server-3.1.0.3.1.0.103-1.jar:3.1.0.3.1.0.103-1]}}{{
> 
> at$RunnableAdapter.call(Executors.java:511)~[?:1.8.0_161]}}{{
> 
> at$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)[hive-exec-3.1.0.3.1.0.103-1.jar:3.1.0-SNAPSHOT]}}{{
> 
> at(InterruptibleTask.java:41)[hive-exec-3.1.0.3.1.0.103-1.jar:3.1.0-SNAPSHOT]}}{{
> 
> at(TrustedListenableFutureTask.java:77)[hive-exec-3.1.0.3.1.0.103-1.jar:3.1.0-SNAPSHOT]}}{{
> 
> at(ThreadPoolExecutor.java:1149)[?:1.8.0_161]}}{{
> 
> at$Worker.run(ThreadPoolExecutor.java:624)[?:1.8.0_161]}}{{
> at(Thread.java:748)[?:1.8.0_161]}}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (HIVE-22125) Move to Kafka 2.3 Clients

2019-08-21 Thread slim bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-22125:
--
Component/s: kafka integration

> Move to Kafka 2.3 Clients
> -
>
> Key: HIVE-22125
> URL: https://issues.apache.org/jira/browse/HIVE-22125
> Project: Hive
>  Issue Type: Improvement
>  Components: kafka integration
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>Priority: Major
> Attachments: HIVE-22125.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (HIVE-22125) Move to Kafka 2.3 Clients

2019-08-21 Thread slim bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-22125:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

[https://git-wip-us.apache.org/repos/asf?p=hive.git;a=commit;h=1cf38b648e9517cbd84ee7faada72996bc757e56]

> Move to Kafka 2.3 Clients
> -
>
> Key: HIVE-22125
> URL: https://issues.apache.org/jira/browse/HIVE-22125
> Project: Hive
>  Issue Type: Improvement
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>Priority: Major
> Attachments: HIVE-22125.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (HIVE-22127) Query Routing logging appender is leaking resources of RandomAccessFileManager.

2019-08-20 Thread slim bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-22127:
--
Component/s: Logging

> Query Routing logging appender is leaking resources of 
> RandomAccessFileManager.
> ---
>
> Key: HIVE-22127
> URL: https://issues.apache.org/jira/browse/HIVE-22127
> Project: Hive
>  Issue Type: Bug
>  Components: Logging
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>Priority: Major
>
> Query routing appender registered by
> {code:java}
> org.apache.hadoop.hive.ql.log.LogDivertAppender#registerRoutingAppender
> {code}
> Is leaking reference to the 
> {code} org.apache.hadoop.hive.ql.log.HushableRandomAccessFileAppender {code}
> On closing operation hooks 
> {code}
> org.apache.hive.service.cli.operation.Operation#cleanupOperationLog
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (HIVE-22115) Prevent the creation of query-router logger in HS2 as per property

2019-08-20 Thread slim bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-22115:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

fixed by commit c0341dcf0602b06fd4e8441d833d708b709164a2

 

> Prevent the creation of query-router logger in HS2 as per property
> --
>
> Key: HIVE-22115
> URL: https://issues.apache.org/jira/browse/HIVE-22115
> Project: Hive
>  Issue Type: Improvement
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>Priority: Major
> Attachments: HIVE-22115.patch, HIVE-22115.patch, HIVE-22115.patch
>
>
> Avoid the creation and registration of query-router logger if the Hive server 
> Property is set to false by the user
> {code}
> HiveConf.ConfVars.HIVE_SERVER2_LOGGING_OPERATION_ENABLED
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (HIVE-22115) Prevent the creation of query-router logger in HS2 as per property

2019-08-20 Thread slim bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-22115:
--
Component/s: Logging

> Prevent the creation of query-router logger in HS2 as per property
> --
>
> Key: HIVE-22115
> URL: https://issues.apache.org/jira/browse/HIVE-22115
> Project: Hive
>  Issue Type: Improvement
>  Components: Logging
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>Priority: Major
> Attachments: HIVE-22115.patch, HIVE-22115.patch, HIVE-22115.patch
>
>
> Avoid the creation and registration of query-router logger if the Hive server 
> Property is set to false by the user
> {code}
> HiveConf.ConfVars.HIVE_SERVER2_LOGGING_OPERATION_ENABLED
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Assigned] (HIVE-22127) Query Routing logging appender is leaking resources of RandomAccessFileManager.

2019-08-20 Thread slim bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra reassigned HIVE-22127:
-


> Query Routing logging appender is leaking resources of 
> RandomAccessFileManager.
> ---
>
> Key: HIVE-22127
> URL: https://issues.apache.org/jira/browse/HIVE-22127
> Project: Hive
>  Issue Type: Bug
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>Priority: Major
>
> Query routing appender registered by
> {code:java}
> org.apache.hadoop.hive.ql.log.LogDivertAppender#registerRoutingAppender
> {code}
> Is leaking reference to the 
> {code} org.apache.hadoop.hive.ql.log.HushableRandomAccessFileAppender {code}
> On closing operation hooks 
> {code}
> org.apache.hive.service.cli.operation.Operation#cleanupOperationLog
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (HIVE-22125) Move to Kafka 2.3 Clients

2019-08-20 Thread slim bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-22125:
--
Attachment: HIVE-22125.patch

> Move to Kafka 2.3 Clients
> -
>
> Key: HIVE-22125
> URL: https://issues.apache.org/jira/browse/HIVE-22125
> Project: Hive
>  Issue Type: Improvement
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>Priority: Major
> Attachments: HIVE-22125.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (HIVE-22125) Move to Kafka 2.3 Clients

2019-08-20 Thread slim bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-22125:
--
Status: Patch Available  (was: Open)

> Move to Kafka 2.3 Clients
> -
>
> Key: HIVE-22125
> URL: https://issues.apache.org/jira/browse/HIVE-22125
> Project: Hive
>  Issue Type: Improvement
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>Priority: Major
> Attachments: HIVE-22125.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Assigned] (HIVE-22106) PCR: Remove cross-query synchronization for the partition-eval

2019-09-04 Thread slim bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra reassigned HIVE-22106:
-

Assignee: slim bouguerra  (was: Gopal V)

> PCR: Remove cross-query synchronization for the partition-eval 
> ---
>
> Key: HIVE-22106
> URL: https://issues.apache.org/jira/browse/HIVE-22106
> Project: Hive
>  Issue Type: Improvement
>  Components: Logical Optimizer
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: slim bouguerra
>Priority: Major
>
> {code}
> HiveServer2-Handler-Pool: Thread-492  Blocked CPU usage on sample: 0ms
>   
> org.apache.hadoop.hive.ql.optimizer.ppr.PartExprEvalUtils.evalExprWithPart(ExprNodeDesc,
>  Partition, List, StructObjectInspector) PartExprEvalUtils.java:58
>   
> org.apache.hadoop.hive.ql.optimizer.pcr.PcrExprProcFactory.evalExprWithPart(ExprNodeDesc,
>  Partition, List) PcrExprProcFactory.java:83
>   
> org.apache.hadoop.hive.ql.optimizer.pcr.PcrExprProcFactory$GenericFuncExprProcessor.handleDeterministicUdf(PcrExprProcCtx,
>  ExprNodeGenericFuncDesc, Object[]) PcrExprProcFactory.java:317
>   
> org.apache.hadoop.hive.ql.optimizer.pcr.PcrExprProcFactory$GenericFuncExprProcessor.process(Node,
>  Stack, NodeProcessorCtx, Object[]) PcrExprProcFactory.java:298
>   org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(Node, Stack, 
> Object[]) DefaultRuleDispatcher.java:90
>   org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(Node, 
> Stack) DefaultGraphWalker.java:105
>   org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(Node, Stack) 
> DefaultGraphWalker.java:89
>   org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(Node) 
> DefaultGraphWalker.java:158
>   org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(Collection, 
> HashMap) DefaultGraphWalker.java:120
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (HIVE-22106) PCR: Remove cross-query synchronization for the partition-eval

2019-09-04 Thread slim bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-22106:
--
Status: Patch Available  (was: Open)

> PCR: Remove cross-query synchronization for the partition-eval 
> ---
>
> Key: HIVE-22106
> URL: https://issues.apache.org/jira/browse/HIVE-22106
> Project: Hive
>  Issue Type: Improvement
>  Components: Logical Optimizer
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: slim bouguerra
>Priority: Major
>
> {code}
> HiveServer2-Handler-Pool: Thread-492  Blocked CPU usage on sample: 0ms
>   
> org.apache.hadoop.hive.ql.optimizer.ppr.PartExprEvalUtils.evalExprWithPart(ExprNodeDesc,
>  Partition, List, StructObjectInspector) PartExprEvalUtils.java:58
>   
> org.apache.hadoop.hive.ql.optimizer.pcr.PcrExprProcFactory.evalExprWithPart(ExprNodeDesc,
>  Partition, List) PcrExprProcFactory.java:83
>   
> org.apache.hadoop.hive.ql.optimizer.pcr.PcrExprProcFactory$GenericFuncExprProcessor.handleDeterministicUdf(PcrExprProcCtx,
>  ExprNodeGenericFuncDesc, Object[]) PcrExprProcFactory.java:317
>   
> org.apache.hadoop.hive.ql.optimizer.pcr.PcrExprProcFactory$GenericFuncExprProcessor.process(Node,
>  Stack, NodeProcessorCtx, Object[]) PcrExprProcFactory.java:298
>   org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(Node, Stack, 
> Object[]) DefaultRuleDispatcher.java:90
>   org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(Node, 
> Stack) DefaultGraphWalker.java:105
>   org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(Node, Stack) 
> DefaultGraphWalker.java:89
>   org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(Node) 
> DefaultGraphWalker.java:158
>   org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(Collection, 
> HashMap) DefaultGraphWalker.java:120
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (HIVE-22106) PCR: Remove cross-query synchronization for the partition-eval

2019-09-04 Thread slim bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-22106:
--
Attachment: HIVE-22106.patch

> PCR: Remove cross-query synchronization for the partition-eval 
> ---
>
> Key: HIVE-22106
> URL: https://issues.apache.org/jira/browse/HIVE-22106
> Project: Hive
>  Issue Type: Improvement
>  Components: Logical Optimizer
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: slim bouguerra
>Priority: Major
> Attachments: HIVE-22106.patch
>
>
> {code}
> HiveServer2-Handler-Pool: Thread-492  Blocked CPU usage on sample: 0ms
>   
> org.apache.hadoop.hive.ql.optimizer.ppr.PartExprEvalUtils.evalExprWithPart(ExprNodeDesc,
>  Partition, List, StructObjectInspector) PartExprEvalUtils.java:58
>   
> org.apache.hadoop.hive.ql.optimizer.pcr.PcrExprProcFactory.evalExprWithPart(ExprNodeDesc,
>  Partition, List) PcrExprProcFactory.java:83
>   
> org.apache.hadoop.hive.ql.optimizer.pcr.PcrExprProcFactory$GenericFuncExprProcessor.handleDeterministicUdf(PcrExprProcCtx,
>  ExprNodeGenericFuncDesc, Object[]) PcrExprProcFactory.java:317
>   
> org.apache.hadoop.hive.ql.optimizer.pcr.PcrExprProcFactory$GenericFuncExprProcessor.process(Node,
>  Stack, NodeProcessorCtx, Object[]) PcrExprProcFactory.java:298
>   org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(Node, Stack, 
> Object[]) DefaultRuleDispatcher.java:90
>   org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(Node, 
> Stack) DefaultGraphWalker.java:105
>   org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(Node, Stack) 
> DefaultGraphWalker.java:89
>   org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(Node) 
> DefaultGraphWalker.java:158
>   org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(Collection, 
> HashMap) DefaultGraphWalker.java:120
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (HIVE-22168) remove excessive logging by llap cache.

2019-09-05 Thread slim bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-22168:
--
Attachment: HIVE-22168.patch

> remove excessive logging by llap cache.
> ---
>
> Key: HIVE-22168
> URL: https://issues.apache.org/jira/browse/HIVE-22168
> Project: Hive
>  Issue Type: Improvement
>  Components: llap, Logging
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>Priority: Major
> Attachments: HIVE-22168.patch, HIVE-22168.patch
>
>
> Llap cache logging is very expensive when it comes to log every request 
> buffers range.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Assigned] (HIVE-22168) remove excessive logging by llap cache.

2019-09-04 Thread slim bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra reassigned HIVE-22168:
-


> remove excessive logging by llap cache.
> ---
>
> Key: HIVE-22168
> URL: https://issues.apache.org/jira/browse/HIVE-22168
> Project: Hive
>  Issue Type: Improvement
>  Components: llap, Logging
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>Priority: Major
>
> Lllap cache logging is very expensive when it comes to log every request 
> buffers range.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (HIVE-22168) remove excessive logging by llap cache.

2019-09-04 Thread slim bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-22168:
--
Attachment: HIVE-22168.patch

> remove excessive logging by llap cache.
> ---
>
> Key: HIVE-22168
> URL: https://issues.apache.org/jira/browse/HIVE-22168
> Project: Hive
>  Issue Type: Improvement
>  Components: llap, Logging
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>Priority: Major
> Attachments: HIVE-22168.patch
>
>
> Llap cache logging is very expensive when it comes to log every request 
> buffers range.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (HIVE-22168) remove excessive logging by llap cache.

2019-09-04 Thread slim bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-22168:
--
Status: Patch Available  (was: Open)

> remove excessive logging by llap cache.
> ---
>
> Key: HIVE-22168
> URL: https://issues.apache.org/jira/browse/HIVE-22168
> Project: Hive
>  Issue Type: Improvement
>  Components: llap, Logging
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>Priority: Major
> Attachments: HIVE-22168.patch
>
>
> Llap cache logging is very expensive when it comes to log every request 
> buffers range.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (HIVE-22168) remove excessive logging by llap cache.

2019-09-04 Thread slim bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-22168:
--
Description: Llap cache logging is very expensive when it comes to log 
every request buffers range.  (was: Lllap cache logging is very expensive when 
it comes to log every request buffers range.)

> remove excessive logging by llap cache.
> ---
>
> Key: HIVE-22168
> URL: https://issues.apache.org/jira/browse/HIVE-22168
> Project: Hive
>  Issue Type: Improvement
>  Components: llap, Logging
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>Priority: Major
>
> Llap cache logging is very expensive when it comes to log every request 
> buffers range.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (HIVE-22168) remove excessive logging by llap cache.

2019-09-06 Thread slim bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-22168:
--
Attachment: HIVE-22168.patch

> remove excessive logging by llap cache.
> ---
>
> Key: HIVE-22168
> URL: https://issues.apache.org/jira/browse/HIVE-22168
> Project: Hive
>  Issue Type: Improvement
>  Components: llap, Logging
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>Priority: Major
> Attachments: HIVE-22168.patch, HIVE-22168.patch, HIVE-22168.patch
>
>
> Llap cache logging is very expensive when it comes to log every request 
> buffers range.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (HIVE-22055) select count gives incorrect result after loading data from text file

2019-09-13 Thread slim bouguerra (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16929300#comment-16929300
 ] 

slim bouguerra commented on HIVE-22055:
---

thanks for this great finding! LGTM.

> select count gives incorrect result after loading data from text file
> -
>
> Key: HIVE-22055
> URL: https://issues.apache.org/jira/browse/HIVE-22055
> Project: Hive
>  Issue Type: Task
>  Components: Hive
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
> Attachments: HIVE-22055.2.patch
>
>
> Add one more load to mm_loaddata.q:
> Load data 3 times (both kv1.txt and kv2.txt contains 500 records)
> {code:java}
> create table load0_mm (key string, value string) stored as textfile 
> tblproperties("transactional"="true", 
> "transactional_properties"="insert_only");
> load data local inpath '../../data/files/kv1.txt' into table load0_mm;
> select count(1) from load0_mm;
> load data local inpath '../../data/files/kv2.txt' into table load0_mm;
> select count(1) from load0_mm;
> load data local inpath '../../data/files/kv2.txt' into table load0_mm;
> select count(1) from load0_mm;{code}
> Expected output
> {code:java}
> PREHOOK: query: load data local inpath '../../data/files/kv2.txt' into table 
> load0_mm
> PREHOOK: type: LOAD
>  A masked pattern was here 
> PREHOOK: Output: default@load0_mm
> POSTHOOK: query: load data local inpath '../../data/files/kv2.txt' into table 
> load0_mm
> POSTHOOK: type: LOAD
>  A masked pattern was here 
> POSTHOOK: Output: default@load0_mm
> PREHOOK: query: select count(1) from load0_mm
> PREHOOK: type: QUERY
> PREHOOK: Input: default@load0_mm
>  A masked pattern was here 
> POSTHOOK: query: select count(1) from load0_mm
> POSTHOOK: type: QUERY
> POSTHOOK: Input: default@load0_mm
>  A masked pattern was here 
> 1500{code}
> Got:
> [ERROR]   TestMiniLlapLocalCliDriver.testCliDriver:59 Client Execution 
> succeeded but contained differences (error code = 1) after executing 
> mm_loaddata.q
> 63c63
> < 1480
> —
> > 1500
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (HIVE-22168) remove excessive logging by llap cache.

2019-09-09 Thread slim bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-22168:
--
Fix Version/s: 4.0.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

[https://git-wip-us.apache.org/repos/asf?p=hive.git;a=commit;h=1f10d587620769c1647262b1956880296f95f0cd]

> remove excessive logging by llap cache.
> ---
>
> Key: HIVE-22168
> URL: https://issues.apache.org/jira/browse/HIVE-22168
> Project: Hive
>  Issue Type: Improvement
>  Components: llap, Logging
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-22168.patch, HIVE-22168.patch, HIVE-22168.patch
>
>
> Llap cache logging is very expensive when it comes to log every request 
> buffers range.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (HIVE-22106) PCR: Remove cross-query synchronization for the partition-eval

2019-09-09 Thread slim bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-22106:
--
Fix Version/s: 4.0.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

[https://git-wip-us.apache.org/repos/asf?p=hive.git;a=commit;h=1b0492479bab71ada88a7d5a0f0545074f06e821]

> PCR: Remove cross-query synchronization for the partition-eval 
> ---
>
> Key: HIVE-22106
> URL: https://issues.apache.org/jira/browse/HIVE-22106
> Project: Hive
>  Issue Type: Improvement
>  Components: Logical Optimizer
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: slim bouguerra
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-22106.patch
>
>
> {code}
> HiveServer2-Handler-Pool: Thread-492  Blocked CPU usage on sample: 0ms
>   
> org.apache.hadoop.hive.ql.optimizer.ppr.PartExprEvalUtils.evalExprWithPart(ExprNodeDesc,
>  Partition, List, StructObjectInspector) PartExprEvalUtils.java:58
>   
> org.apache.hadoop.hive.ql.optimizer.pcr.PcrExprProcFactory.evalExprWithPart(ExprNodeDesc,
>  Partition, List) PcrExprProcFactory.java:83
>   
> org.apache.hadoop.hive.ql.optimizer.pcr.PcrExprProcFactory$GenericFuncExprProcessor.handleDeterministicUdf(PcrExprProcCtx,
>  ExprNodeGenericFuncDesc, Object[]) PcrExprProcFactory.java:317
>   
> org.apache.hadoop.hive.ql.optimizer.pcr.PcrExprProcFactory$GenericFuncExprProcessor.process(Node,
>  Stack, NodeProcessorCtx, Object[]) PcrExprProcFactory.java:298
>   org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(Node, Stack, 
> Object[]) DefaultRuleDispatcher.java:90
>   org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(Node, 
> Stack) DefaultGraphWalker.java:105
>   org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(Node, Stack) 
> DefaultGraphWalker.java:89
>   org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(Node) 
> DefaultGraphWalker.java:158
>   org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(Collection, 
> HashMap) DefaultGraphWalker.java:120
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (HIVE-22175) TestBudyAllocator#testMTT test is flaky

2019-09-11 Thread slim bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-22175:
--
Component/s: llap

> TestBudyAllocator#testMTT test is flaky
> ---
>
> Key: HIVE-22175
> URL: https://issues.apache.org/jira/browse/HIVE-22175
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Adam Szita
>Priority: Major
>
> This test has a fail rate of about 20%-25%



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (HIVE-22175) TestBudyAllocator#testMTT test is flaky

2019-09-11 Thread slim bouguerra (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927468#comment-16927468
 ] 

slim bouguerra commented on HIVE-22175:
---

[~szita] where to find the log of the failing tests.
My guess is that might be timeouts or race conditions issue due to the 
multi-threading setup.


> TestBudyAllocator#testMTT test is flaky
> ---
>
> Key: HIVE-22175
> URL: https://issues.apache.org/jira/browse/HIVE-22175
> Project: Hive
>  Issue Type: Bug
>Reporter: Adam Szita
>Priority: Major
>
> This test has a fail rate of about 20%-25%



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (HIVE-21391) LLAP: Pool of column vector buffers can cause memory pressure

2019-07-19 Thread slim bouguerra (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-21391:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> LLAP: Pool of column vector buffers can cause memory pressure
> -
>
> Key: HIVE-21391
> URL: https://issues.apache.org/jira/browse/HIVE-21391
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 4.0.0, 3.2.0
>Reporter: Prasanth Jayachandran
>Assignee: slim bouguerra
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21391.1.patch, HIVE-21391.2.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Where there are too many columns (in the order of 100s), with decimal, string 
> types the column vector pool of buffers created here 
> [https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/io/decode/EncodedDataConsumer.java#L59]
>  can cause memory pressure. 
> Example:
> 128 (poolSize) * 300 (numCols) * 1024 (batchSize) * 80 (decimalSize) ~= 3GB
> The pool size keeps increasing when there is slow consumer but fast llap io 
> (SSDs) leading to GC pressure when all LLAP io threads read splits from same 
> table. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (HIVE-21901) Join queries across different datasources (Druid and JDBC StorageHandler)

2019-07-09 Thread slim bouguerra (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881402#comment-16881402
 ] 

slim bouguerra commented on HIVE-21901:
---

in theory this class 
{code }DruidSelectQueryRecordReader.java {code}
should not be used but i see in practice still used, will look at this. 

> Join queries across different datasources (Druid and JDBC StorageHandler)
> -
>
> Key: HIVE-21901
> URL: https://issues.apache.org/jira/browse/HIVE-21901
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration, StorageHandler
>Affects Versions: 3.1.1
>Reporter: Subramani Raju V
>Priority: Major
>
> We have a druid datasource and have external table created in hive for the 
> same datasource.
> For example: 
>  
> {code:java}
> CREATE EXTERNAL TABLE druid_table_1
> STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
> TBLPROPERTIES ("druid.datasource" = "wikipedia");
> {code}
>  
>  
> We have another table in mysql database, which also has an external table 
> created in hive in this fashion: 
>  
> {code:java}
> CREATE EXTERNAL TABLE sample_table_1
> (
> old_id int,
> city_name string,
> new_id int
> )
> STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler'
> TBLPROPERTIES (
> "hive.sql.database.type" = "MYSQL",
> "hive.sql.jdbc.driver" = "com.mysql.jdbc.Driver",
> "hive.sql.jdbc.url" = "jdbc:mysql://172.16.0.15:3307/test",
> "hive.sql.dbcp.username" = "hive_user",
> "hive.sql.dbcp.password" = "hive_pass",
> "hive.sql.table" = "city_mapping"
> );
> {code}
> So we are able to perform normal queries on the individual tables, but when 
> we try to do join operation for both the above tables in this fashion: 
>  
>  
> {code:java}
> SELECT *
> FROM druid_table_1 o
> JOIN sample_table_1 c
> ON (c.city_name = o.channel) limit 10;
> {code}
> Then we are getting the error as follows: 
>  
>  
> {code:java}
> TaskAttempt 1 failed, info=[Error: Error while running task ( failure ) : 
> attempt_1560945328057_0022_2_01_00_1:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: 
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
> at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
> at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
> at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.io.IOException: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:80)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:419)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267)
> ... 16 more
> Caused by: java.io.IOException: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365)
> at 
> org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79)
> at 
> org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33)
> at 
> 

[jira] [Comment Edited] (HIVE-21901) Join queries across different datasources (Druid and JDBC StorageHandler)

2019-07-09 Thread slim bouguerra (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881402#comment-16881402
 ] 

slim bouguerra edited comment on HIVE-21901 at 7/9/19 5:54 PM:
---

in theory this class 
{code}
DruidSelectQueryRecordReader.java 
{code}
should not be used but i see in practice still used, will look at this. 


was (Author: bslim):
in theory this class 
{code }DruidSelectQueryRecordReader.java {code}
should not be used but i see in practice still used, will look at this. 

> Join queries across different datasources (Druid and JDBC StorageHandler)
> -
>
> Key: HIVE-21901
> URL: https://issues.apache.org/jira/browse/HIVE-21901
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration, StorageHandler
>Affects Versions: 3.1.1
>Reporter: Subramani Raju V
>Priority: Major
>
> We have a druid datasource and have external table created in hive for the 
> same datasource.
> For example: 
>  
> {code:java}
> CREATE EXTERNAL TABLE druid_table_1
> STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
> TBLPROPERTIES ("druid.datasource" = "wikipedia");
> {code}
>  
>  
> We have another table in mysql database, which also has an external table 
> created in hive in this fashion: 
>  
> {code:java}
> CREATE EXTERNAL TABLE sample_table_1
> (
> old_id int,
> city_name string,
> new_id int
> )
> STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler'
> TBLPROPERTIES (
> "hive.sql.database.type" = "MYSQL",
> "hive.sql.jdbc.driver" = "com.mysql.jdbc.Driver",
> "hive.sql.jdbc.url" = "jdbc:mysql://172.16.0.15:3307/test",
> "hive.sql.dbcp.username" = "hive_user",
> "hive.sql.dbcp.password" = "hive_pass",
> "hive.sql.table" = "city_mapping"
> );
> {code}
> So we are able to perform normal queries on the individual tables, but when 
> we try to do join operation for both the above tables in this fashion: 
>  
>  
> {code:java}
> SELECT *
> FROM druid_table_1 o
> JOIN sample_table_1 c
> ON (c.city_name = o.channel) limit 10;
> {code}
> Then we are getting the error as follows: 
>  
>  
> {code:java}
> TaskAttempt 1 failed, info=[Error: Error while running task ( failure ) : 
> attempt_1560945328057_0022_2_01_00_1:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: 
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
> at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
> at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
> at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.io.IOException: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:80)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:419)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267)
> ... 16 more
> Caused by: java.io.IOException: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365)
> at 
> 

[jira] [Assigned] (HIVE-22446) Make IO decoding quantiles counters less contended resource.

2019-11-01 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra reassigned HIVE-22446:
-


> Make IO decoding quantiles counters less contended resource.
> 
>
> Key: HIVE-22446
> URL: https://issues.apache.org/jira/browse/HIVE-22446
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Fix For: 4.0.0
>
>
> Currently LLAP IO relies on Hadoop Lock-based quantiles data structure and 
> updates the IO decoding sample on a per batch based using.
> {code} 
> org.apache.hadoop.hive.llap.metrics.LlapDaemonIOMetrics#addDecodeBatchTime
> {code}
> via 
> {code} 
> org.apache.hadoop.hive.llap.io.decode.EncodedDataConsumer#consumeData
> {code}
> This can be a source of thread contention.
> Goal of this ticket is to reduce the frequency of updates to avoid major 
> bottleneck.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22476) Hive datediff function provided inconsistent results when hive.fetch.task.conversion is set to none

2019-11-08 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22476:
--
Attachment: HIVE-22476.patch

> Hive datediff function provided inconsistent results when 
> hive.fetch.task.conversion is set to none
> ---
>
> Key: HIVE-22476
> URL: https://issues.apache.org/jira/browse/HIVE-22476
> Project: Hive
>  Issue Type: Bug
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22476.patch
>
>
> The actual issue stems to the different date parser used by various part of 
> the engine.
> Fetch task uses udfdatediff via {code} 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFToDate{code} while the 
> vectorized llap execution uses {code}VectorUDFDateDiffScalarCol{code}.
> This fix is meant to be not very intrusive and will add more support to the 
> GenericUDFToDate by enhancing the parser.
> For the longer term will be better to use one parser for all the operators.
> Thanks [~Rajkumar Singh] for the repro example
> {code} 
> create external table testdatediff(datetimecol string) stored as orc;
> insert into testdatediff values ('2019-09-09T10:45:49+02:00'),('2019-07-24');
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183;
> set hive.ferch.task.conversion=none;
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22476) Hive datediff function provided inconsistent results when hive.fetch.task.conversion is set to none

2019-11-08 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22476:
--
Status: Patch Available  (was: Open)

> Hive datediff function provided inconsistent results when 
> hive.fetch.task.conversion is set to none
> ---
>
> Key: HIVE-22476
> URL: https://issues.apache.org/jira/browse/HIVE-22476
> Project: Hive
>  Issue Type: Bug
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
>
> The actual issue stems to the different date parser used by various part of 
> the engine.
> Fetch task uses udfdatediff via {code} 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFToDate{code} while the 
> vectorized llap execution uses {code}VectorUDFDateDiffScalarCol{code}.
> This fix is meant to be not very intrusive and will add more support to the 
> GenericUDFToDate by enhancing the parser.
> For the longer term will be better to use one parser for all the operators.
> Thanks [~Rajkumar Singh] for the repro example
> {code} 
> create external table testdatediff(datetimecol string) stored as orc;
> insert into testdatediff values ('2019-09-09T10:45:49+02:00'),('2019-07-24');
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183;
> set hive.ferch.task.conversion=none;
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-22476) Hive datediff function provided inconsistent results when hive.fetch.task.conversion is set to none

2019-11-08 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra reassigned HIVE-22476:
-


> Hive datediff function provided inconsistent results when 
> hive.fetch.task.conversion is set to none
> ---
>
> Key: HIVE-22476
> URL: https://issues.apache.org/jira/browse/HIVE-22476
> Project: Hive
>  Issue Type: Bug
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
>
> The actual issue stems to the different date parser used by various part of 
> the engine.
> Fetch task uses udfdatediff via {code} 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFToDate{code} while the 
> vectorized llap execution uses {code}VectorUDFDateDiffScalarCol{code}.
> This fix is meant to be not very intrusive and will add more support to the 
> GenericUDFToDate by enhancing the parser.
> For the longer term will be better to use one parser for all the operators.
> Thanks [~Rajkumar Singh] for the repro example
> {code} 
> create external table testdatediff(datetimecol string) stored as orc;
> insert into testdatediff values ('2019-09-09T10:45:49+02:00'),('2019-07-24');
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183;
> set hive.ferch.task.conversion=none;
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21894) Hadoop credential password storage for the Kafka Storage handler when security is SSL

2019-11-05 Thread Slim Bouguerra (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967669#comment-16967669
 ] 

Slim Bouguerra commented on HIVE-21894:
---

[~kristopherkane] did you take a look at this work 
https://issues.apache.org/jira/browse/HIVE-20651
it is doing very similar thing.
FYI did not look at the PR yet, hope will get to it ASAP.
Thanks

> Hadoop credential password storage for the Kafka Storage handler when 
> security is SSL
> -
>
> Key: HIVE-21894
> URL: https://issues.apache.org/jira/browse/HIVE-21894
> Project: Hive
>  Issue Type: Improvement
>  Components: kafka integration
>Affects Versions: 4.0.0
>Reporter: Kristopher Kane
>Assignee: Kristopher Kane
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The Kafka storage handler assumes that if the Hive service is configured with 
> Kerberos then the destination Kafka cluster is also secured with the same 
> Kerberos realm or trust of realms.  The security configuration of the Kafka 
> client can be overwritten due to the additive operations of the Kafka client 
> configs, but, the only way to specify SSL and the keystore/truststore 
> user/pass is via plain text table properties. 
> This ticket proposes adding Hadoop credential security to the Kafka storage 
> handler in support of SSL secured Kafka clusters.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22446) Make IO decoding quantiles counters less contended resource.

2019-11-05 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22446:
--
Attachment: HIVE-22446.patch

> Make IO decoding quantiles counters less contended resource.
> 
>
> Key: HIVE-22446
> URL: https://issues.apache.org/jira/browse/HIVE-22446
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-22446.patch
>
>
> Currently LLAP IO relies on Hadoop Lock-based quantiles data structure and 
> updates the IO decoding sample on a per batch based using.
> {code} 
> org.apache.hadoop.hive.llap.metrics.LlapDaemonIOMetrics#addDecodeBatchTime
> {code}
> via 
> {code} 
> org.apache.hadoop.hive.llap.io.decode.EncodedDataConsumer#consumeData
> {code}
> This can be a source of thread contention.
> Goal of this ticket is to reduce the frequency of updates to avoid major 
> bottleneck.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22446) Make IO decoding quantiles counters less contended resource.

2019-11-05 Thread Slim Bouguerra (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967803#comment-16967803
 ] 

Slim Bouguerra commented on HIVE-22446:
---

[~t3rmin4t0r] can you take a look at this ? if you are okay will add more 
testing 

> Make IO decoding quantiles counters less contended resource.
> 
>
> Key: HIVE-22446
> URL: https://issues.apache.org/jira/browse/HIVE-22446
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-22446.patch
>
>
> Currently LLAP IO relies on Hadoop Lock-based quantiles data structure and 
> updates the IO decoding sample on a per batch based using.
> {code} 
> org.apache.hadoop.hive.llap.metrics.LlapDaemonIOMetrics#addDecodeBatchTime
> {code}
> via 
> {code} 
> org.apache.hadoop.hive.llap.io.decode.EncodedDataConsumer#consumeData
> {code}
> This can be a source of thread contention.
> Goal of this ticket is to reduce the frequency of updates to avoid major 
> bottleneck.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HIVE-22446) Make IO decoding quantiles counters less contended resource.

2019-11-05 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-22446 started by Slim Bouguerra.
-
> Make IO decoding quantiles counters less contended resource.
> 
>
> Key: HIVE-22446
> URL: https://issues.apache.org/jira/browse/HIVE-22446
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Fix For: 4.0.0
>
>
> Currently LLAP IO relies on Hadoop Lock-based quantiles data structure and 
> updates the IO decoding sample on a per batch based using.
> {code} 
> org.apache.hadoop.hive.llap.metrics.LlapDaemonIOMetrics#addDecodeBatchTime
> {code}
> via 
> {code} 
> org.apache.hadoop.hive.llap.io.decode.EncodedDataConsumer#consumeData
> {code}
> This can be a source of thread contention.
> Goal of this ticket is to reduce the frequency of updates to avoid major 
> bottleneck.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22446) Make IO decoding quantiles counters less contended resource.

2019-11-05 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22446:
--
Status: Patch Available  (was: In Progress)

> Make IO decoding quantiles counters less contended resource.
> 
>
> Key: HIVE-22446
> URL: https://issues.apache.org/jira/browse/HIVE-22446
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Fix For: 4.0.0
>
>
> Currently LLAP IO relies on Hadoop Lock-based quantiles data structure and 
> updates the IO decoding sample on a per batch based using.
> {code} 
> org.apache.hadoop.hive.llap.metrics.LlapDaemonIOMetrics#addDecodeBatchTime
> {code}
> via 
> {code} 
> org.apache.hadoop.hive.llap.io.decode.EncodedDataConsumer#consumeData
> {code}
> This can be a source of thread contention.
> Goal of this ticket is to reduce the frequency of updates to avoid major 
> bottleneck.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22436) Add more logging to the test.

2019-10-30 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22436:
--
Attachment: HIVE-22436.patch

> Add more logging to the test.
> -
>
> Key: HIVE-22436
> URL: https://issues.apache.org/jira/browse/HIVE-22436
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22436.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-22436) Add more logging to the test.

2019-10-30 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra reassigned HIVE-22436:
-


> Add more logging to the test.
> -
>
> Key: HIVE-22436
> URL: https://issues.apache.org/jira/browse/HIVE-22436
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22436) Add more logging to the test.

2019-10-30 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22436:
--
Status: Patch Available  (was: Open)

> Add more logging to the test.
> -
>
> Key: HIVE-22436
> URL: https://issues.apache.org/jira/browse/HIVE-22436
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22175) TestBudyAllocator#testMTT test is flaky

2019-10-30 Thread Slim Bouguerra (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16963266#comment-16963266
 ] 

Slim Bouguerra commented on HIVE-22175:
---

[~abstractdog] am not dropping the ball on this, will augment the test with 
more logging and will keep looking.
Thanks for letting me know.


> TestBudyAllocator#testMTT test is flaky
> ---
>
> Key: HIVE-22175
> URL: https://issues.apache.org/jira/browse/HIVE-22175
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Ádám Szita
>Assignee: John Sherman
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-22175.1.patch, HIVE-22175.2.patch, 
> HIVE-22175.3.patch, HIVE-22175.4.patch, HIVE-22175.5.patch
>
>
> This test has a fail rate of about 20%-25%



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-22437) LLAP Metadata cache NPE on locking metadata.

2019-10-30 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra reassigned HIVE-22437:
-


> LLAP Metadata cache NPE on locking metadata.
> 
>
> Key: HIVE-22437
> URL: https://issues.apache.org/jira/browse/HIVE-22437
> Project: Hive
>  Issue Type: Bug
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
>
> {code}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.llap.io.metadata.MetadataCache.unlockSingleBuffer(MetadataCache.java:464)
>   at 
> org.apache.hadoop.hive.llap.io.metadata.MetadataCache.lockBuffer(MetadataCache.java:409)
>   at 
> org.apache.hadoop.hive.llap.io.metadata.MetadataCache.lockOldVal(MetadataCache.java:314)
>   at 
> org.apache.hadoop.hive.llap.io.metadata.MetadataCache.putInternal(MetadataCache.java:287)
>   at 
> org.apache.hadoop.hive.llap.io.metadata.MetadataCache.putFileMetadata(MetadataCache.java:199)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22476) Hive datediff function provided inconsistent results when hive.fetch.task.conversion is set to none

2019-11-13 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22476:
--
Attachment: HIVE-22476.4.patch

> Hive datediff function provided inconsistent results when 
> hive.fetch.task.conversion is set to none
> ---
>
> Key: HIVE-22476
> URL: https://issues.apache.org/jira/browse/HIVE-22476
> Project: Hive
>  Issue Type: Bug
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22476.2.patch, HIVE-22476.3.patch, 
> HIVE-22476.4.patch, HIVE-22476.patch
>
>
> The actual issue stems to the different date parser used by various part of 
> the engine.
> Fetch task uses udfdatediff via {code} 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFToDate{code} while the 
> vectorized llap execution uses {code}VectorUDFDateDiffScalarCol{code}.
> This fix is meant to be not very intrusive and will add more support to the 
> GenericUDFToDate by enhancing the parser.
> For the longer term will be better to use one parser for all the operators.
> Thanks [~Rajkumar Singh] for the repro example
> {code} 
> create external table testdatediff(datetimecol string) stored as orc;
> insert into testdatediff values ('2019-09-09T10:45:49+02:00'),('2019-07-24');
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183;
> set hive.ferch.task.conversion=none;
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22476) Hive datediff function provided inconsistent results when hive.fetch.task.conversion is set to none

2019-11-13 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22476:
--
Attachment: (was: HIVE-22476.4.patch)

> Hive datediff function provided inconsistent results when 
> hive.fetch.task.conversion is set to none
> ---
>
> Key: HIVE-22476
> URL: https://issues.apache.org/jira/browse/HIVE-22476
> Project: Hive
>  Issue Type: Bug
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22476.2.patch, HIVE-22476.3.patch, 
> HIVE-22476.5.patch, HIVE-22476.patch
>
>
> The actual issue stems to the different date parser used by various part of 
> the engine.
> Fetch task uses udfdatediff via {code} 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFToDate{code} while the 
> vectorized llap execution uses {code}VectorUDFDateDiffScalarCol{code}.
> This fix is meant to be not very intrusive and will add more support to the 
> GenericUDFToDate by enhancing the parser.
> For the longer term will be better to use one parser for all the operators.
> Thanks [~Rajkumar Singh] for the repro example
> {code} 
> create external table testdatediff(datetimecol string) stored as orc;
> insert into testdatediff values ('2019-09-09T10:45:49+02:00'),('2019-07-24');
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183;
> set hive.ferch.task.conversion=none;
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22476) Hive datediff function provided inconsistent results when hive.fetch.task.conversion is set to none

2019-11-13 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22476:
--
Attachment: HIVE-22476.5.patch

> Hive datediff function provided inconsistent results when 
> hive.fetch.task.conversion is set to none
> ---
>
> Key: HIVE-22476
> URL: https://issues.apache.org/jira/browse/HIVE-22476
> Project: Hive
>  Issue Type: Bug
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22476.2.patch, HIVE-22476.3.patch, 
> HIVE-22476.5.patch, HIVE-22476.patch
>
>
> The actual issue stems to the different date parser used by various part of 
> the engine.
> Fetch task uses udfdatediff via {code} 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFToDate{code} while the 
> vectorized llap execution uses {code}VectorUDFDateDiffScalarCol{code}.
> This fix is meant to be not very intrusive and will add more support to the 
> GenericUDFToDate by enhancing the parser.
> For the longer term will be better to use one parser for all the operators.
> Thanks [~Rajkumar Singh] for the repro example
> {code} 
> create external table testdatediff(datetimecol string) stored as orc;
> insert into testdatediff values ('2019-09-09T10:45:49+02:00'),('2019-07-24');
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183;
> set hive.ferch.task.conversion=none;
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-22492) Amortize lock contention due to LRFU accounting

2019-11-13 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra reassigned HIVE-22492:
-


> Amortize lock contention due to LRFU accounting
> ---
>
> Key: HIVE-22492
> URL: https://issues.apache.org/jira/browse/HIVE-22492
> Project: Hive
>  Issue Type: Improvement
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
>
> LRFU eviction policy can be a major source of contention under high load.
> This can be see on the following profiles.
> To fix this the idea is to use a batching wrapper to amortize the locking 
> contention.
> The trick i a common way to amortize locking as explained here 
> http://www.ece.eng.wayne.edu/~sjiang/pubs/papers/ding-09-BP-Wrapper.pdf



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22492) Amortize lock contention due to LRFU accounting

2019-11-13 Thread Slim Bouguerra (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16973593#comment-16973593
 ] 

Slim Bouguerra commented on HIVE-22492:
---

[~t3rmin4t0r] can you take a look at this seems to be a very low hanging fruit.

> Amortize lock contention due to LRFU accounting
> ---
>
> Key: HIVE-22492
> URL: https://issues.apache.org/jira/browse/HIVE-22492
> Project: Hive
>  Issue Type: Improvement
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22492.patch, llap-lock-contention.svg
>
>
> LRFU eviction policy can be a major source of contention under high load.
> This can be see on the following profiles.
> To fix this the idea is to use a batching wrapper to amortize the locking 
> contention.
> The trick i a common way to amortize locking as explained here 
> http://www.ece.eng.wayne.edu/~sjiang/pubs/papers/ding-09-BP-Wrapper.pdf



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22492) Amortize lock contention due to LRFU accounting

2019-11-13 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22492:
--
Attachment: HIVE-22492.patch

> Amortize lock contention due to LRFU accounting
> ---
>
> Key: HIVE-22492
> URL: https://issues.apache.org/jira/browse/HIVE-22492
> Project: Hive
>  Issue Type: Improvement
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22492.patch, llap-lock-contention.svg
>
>
> LRFU eviction policy can be a major source of contention under high load.
> This can be see on the following profiles.
> To fix this the idea is to use a batching wrapper to amortize the locking 
> contention.
> The trick i a common way to amortize locking as explained here 
> http://www.ece.eng.wayne.edu/~sjiang/pubs/papers/ding-09-BP-Wrapper.pdf



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22492) Amortize lock contention due to LRFU accounting

2019-11-13 Thread Slim Bouguerra (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16973569#comment-16973569
 ] 

Slim Bouguerra commented on HIVE-22492:
---

{code}
IO-Elevator-Thread-9  Blocked CPU usage on sample: 609ms
  
org.apache.hadoop.hive.llap.cache.LowLevelLrfuCachePolicy.notifyUnlock(LlapCacheableBuffer)
 LowLevelLrfuCachePolicy.java:125
  
org.apache.hadoop.hive.llap.cache.CacheContentsTracker.notifyUnlock(LlapCacheableBuffer)
 CacheContentsTracker.java:173
  
org.apache.hadoop.hive.llap.cache.LowLevelCacheImpl.unlockBuffer(LlapDataBuffer,
 boolean) LowLevelCacheImpl.java:391
  org.apache.hadoop.hive.llap.cache.LowLevelCacheImpl.decRefBuffers(List) 
LowLevelCacheImpl.java:379
  
org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.returnData(Reader$OrcEncodedColumnBatch)
 OrcEncodedDataReader.java:759
  
org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.returnData(Object) 
OrcEncodedDataReader.java:110
  
org.apache.hadoop.hive.llap.io.decode.EncodedDataConsumer.returnSourceData(EncodedColumnBatch)
 EncodedDataConsumer.java:100
  
org.apache.hadoop.hive.llap.io.decode.EncodedDataConsumer.consumeData(EncodedColumnBatch)
 EncodedDataConsumer.java:92
  org.apache.hadoop.hive.llap.io.decode.EncodedDataConsumer.consumeData(Object) 
EncodedDataConsumer.java:34
  
org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.readEncodedColumns(int,
 StripeInformation, OrcProto$RowIndex[], List, List, boolean[], boolean[], 
Consumer) EncodedReaderImpl.java:532
  org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.performDataRead() 
OrcEncodedDataReader.java:407
  org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader$4.run() 
OrcEncodedDataReader.java:266
  org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader$4.run() 
OrcEncodedDataReader.java:263
  java.security.AccessController.doPrivileged(PrivilegedExceptionAction, 
AccessControlContext) AccessController.java (native)
  javax.security.auth.Subject.doAs(Subject, PrivilegedExceptionAction) 
Subject.java:422
  
org.apache.hadoop.security.UserGroupInformation.doAs(PrivilegedExceptionAction) 
UserGroupInformation.java:1688
  org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal() 
OrcEncodedDataReader.java:263
  org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal() 
OrcEncodedDataReader.java:110
  org.apache.tez.common.CallableWithNdc.call() CallableWithNdc.java:36
  
org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call()
 StatsRecordingThreadPool.java:110
  java.util.concurrent.FutureTask.run() FutureTask.java:266
  java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
ThreadPoolExecutor.java:1142
  java.util.concurrent.ThreadPoolExecutor$Worker.run() 
ThreadPoolExecutor.java:617
  java.lang.Thread.run() Thread.java:745

{code}

> Amortize lock contention due to LRFU accounting
> ---
>
> Key: HIVE-22492
> URL: https://issues.apache.org/jira/browse/HIVE-22492
> Project: Hive
>  Issue Type: Improvement
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
>
> LRFU eviction policy can be a major source of contention under high load.
> This can be see on the following profiles.
> To fix this the idea is to use a batching wrapper to amortize the locking 
> contention.
> The trick i a common way to amortize locking as explained here 
> http://www.ece.eng.wayne.edu/~sjiang/pubs/papers/ding-09-BP-Wrapper.pdf



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22476) Hive datediff function provided inconsistent results when hive.fetch.task.conversion is set to none

2019-11-13 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22476:
--
Attachment: HIVE-22476.3.patch

> Hive datediff function provided inconsistent results when 
> hive.fetch.task.conversion is set to none
> ---
>
> Key: HIVE-22476
> URL: https://issues.apache.org/jira/browse/HIVE-22476
> Project: Hive
>  Issue Type: Bug
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22476.2.patch, HIVE-22476.3.patch, HIVE-22476.patch
>
>
> The actual issue stems to the different date parser used by various part of 
> the engine.
> Fetch task uses udfdatediff via {code} 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFToDate{code} while the 
> vectorized llap execution uses {code}VectorUDFDateDiffScalarCol{code}.
> This fix is meant to be not very intrusive and will add more support to the 
> GenericUDFToDate by enhancing the parser.
> For the longer term will be better to use one parser for all the operators.
> Thanks [~Rajkumar Singh] for the repro example
> {code} 
> create external table testdatediff(datetimecol string) stored as orc;
> insert into testdatediff values ('2019-09-09T10:45:49+02:00'),('2019-07-24');
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183;
> set hive.ferch.task.conversion=none;
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22492) Amortize lock contention due to LRFU accounting

2019-11-13 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22492:
--
Status: Patch Available  (was: Open)

> Amortize lock contention due to LRFU accounting
> ---
>
> Key: HIVE-22492
> URL: https://issues.apache.org/jira/browse/HIVE-22492
> Project: Hive
>  Issue Type: Improvement
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22492.patch, llap-lock-contention.svg
>
>
> LRFU eviction policy can be a major source of contention under high load.
> This can be see on the following profiles.
> To fix this the idea is to use a batching wrapper to amortize the locking 
> contention.
> The trick i a common way to amortize locking as explained here 
> http://www.ece.eng.wayne.edu/~sjiang/pubs/papers/ding-09-BP-Wrapper.pdf



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22492) Amortize lock contention due to LRFU accounting

2019-11-13 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22492:
--
Attachment: llap-lock-contention.svg

> Amortize lock contention due to LRFU accounting
> ---
>
> Key: HIVE-22492
> URL: https://issues.apache.org/jira/browse/HIVE-22492
> Project: Hive
>  Issue Type: Improvement
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: llap-lock-contention.svg
>
>
> LRFU eviction policy can be a major source of contention under high load.
> This can be see on the following profiles.
> To fix this the idea is to use a batching wrapper to amortize the locking 
> contention.
> The trick i a common way to amortize locking as explained here 
> http://www.ece.eng.wayne.edu/~sjiang/pubs/papers/ding-09-BP-Wrapper.pdf



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22459) Hive datadiff function provided inconsistent results when hive.ferch.task.conversion is set to none

2019-11-11 Thread Slim Bouguerra (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16972033#comment-16972033
 ] 

Slim Bouguerra commented on HIVE-22459:
---

i think this is somehow the same issue, working on it.

> Hive datadiff function provided inconsistent results when 
> hive.ferch.task.conversion is set to none
> ---
>
> Key: HIVE-22459
> URL: https://issues.apache.org/jira/browse/HIVE-22459
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 3.0.0
>Reporter: Chiran Ravani
>Priority: Critical
>
> Hive datadiff function provided inconsistent results when 
> hive.ferch.task.conversion to more
> Below is output, whereas in Hive 1.2 the results are consistent
> Note: Same query works well on Hive 3 when hive.ferch.task.conversion is set 
> to none
>  Steps to reproduce the problem.
> {code:java}
> 0: jdbc:hive2://c1113-node2.squadron.support.> select datetimecol from 
> testdatediff where datediff(cast(current_timestamp as string), 
> datetimecol)<183;
> INFO : Compiling 
> command(queryId=hive_20191105103636_1dff22a1-02f3-48a8-b076-0b91272f2268): 
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183
> INFO : Semantic Analysis Completed (retrial = false)
> INFO : Returning Hive schema: 
> Schema(fieldSchemas:[FieldSchema(name:datetimecol, type:string, 
> comment:null)], properties:null)
> INFO : Completed compiling 
> command(queryId=hive_20191105103636_1dff22a1-02f3-48a8-b076-0b91272f2268); 
> Time taken: 0.479 seconds
> INFO : Executing 
> command(queryId=hive_20191105103636_1dff22a1-02f3-48a8-b076-0b91272f2268): 
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183
> INFO : Completed executing 
> command(queryId=hive_20191105103636_1dff22a1-02f3-48a8-b076-0b91272f2268); 
> Time taken: 0.013 seconds
> INFO : OK
> +--+
> | datetimecol |
> +--+
> | 2019-07-24 |
> +--+
> 1 row selected (0.797 seconds)
> 0: jdbc:hive2://c1113-node2.squadron.support.>
> {code}
> After setting fetch task conversion as none.
> {code:java}
> 0: jdbc:hive2://c1113-node2.squadron.support.> set 
> hive.fetch.task.conversion=none;
> No rows affected (0.017 seconds)
> 0: jdbc:hive2://c1113-node2.squadron.support.> set hive.fetch.task.conversion;
> +--+
> | set |
> +--+
> | hive.fetch.task.conversion=none |
> +--+
> 1 row selected (0.015 seconds)
> 0: jdbc:hive2://c1113-node2.squadron.support.> select datetimecol from 
> testdatediff where datediff(cast(current_timestamp as string), 
> datetimecol)<183;
> INFO : Compiling 
> command(queryId=hive_20191105103709_0c38e446-09cf-45dd-9553-365146f42452): 
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183
> ++
> | datetimecol |
> ++
> | 2019-09-09T10:45:49+02:00 |
> | 2019-07-24 |
> ++
> 2 rows selected (5.327 seconds)
> 0: jdbc:hive2://c1113-node2.squadron.support.>
> {code}
> Steps to reproduce
> {code}
> create external table testdatediff(datetimecol string) stored as orc;
> insert into testdatediff values ('2019-09-09T10:45:49+02:00'),('2019-07-24');
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183;
> set hive.ferch.task.conversion=none;
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22476) Hive datediff function provided inconsistent results when hive.fetch.task.conversion is set to none

2019-11-12 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22476:
--
Attachment: HIVE-22476.2.patch

> Hive datediff function provided inconsistent results when 
> hive.fetch.task.conversion is set to none
> ---
>
> Key: HIVE-22476
> URL: https://issues.apache.org/jira/browse/HIVE-22476
> Project: Hive
>  Issue Type: Bug
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22476.2.patch, HIVE-22476.patch
>
>
> The actual issue stems to the different date parser used by various part of 
> the engine.
> Fetch task uses udfdatediff via {code} 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFToDate{code} while the 
> vectorized llap execution uses {code}VectorUDFDateDiffScalarCol{code}.
> This fix is meant to be not very intrusive and will add more support to the 
> GenericUDFToDate by enhancing the parser.
> For the longer term will be better to use one parser for all the operators.
> Thanks [~Rajkumar Singh] for the repro example
> {code} 
> create external table testdatediff(datetimecol string) stored as orc;
> insert into testdatediff values ('2019-09-09T10:45:49+02:00'),('2019-07-24');
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183;
> set hive.ferch.task.conversion=none;
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22476) Hive datediff function provided inconsistent results when hive.fetch.task.conversion is set to none

2019-11-15 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22476:
--
Attachment: (was: HIVE-22476.patch)

> Hive datediff function provided inconsistent results when 
> hive.fetch.task.conversion is set to none
> ---
>
> Key: HIVE-22476
> URL: https://issues.apache.org/jira/browse/HIVE-22476
> Project: Hive
>  Issue Type: Bug
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22476.2.patch, HIVE-22476.3.patch, 
> HIVE-22476.5.patch
>
>
> The actual issue stems to the different date parser used by various part of 
> the engine.
> Fetch task uses udfdatediff via {code} 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFToDate{code} while the 
> vectorized llap execution uses {code}VectorUDFDateDiffScalarCol{code}.
> This fix is meant to be not very intrusive and will add more support to the 
> GenericUDFToDate by enhancing the parser.
> For the longer term will be better to use one parser for all the operators.
> Thanks [~Rajkumar Singh] for the repro example
> {code} 
> create external table testdatediff(datetimecol string) stored as orc;
> insert into testdatediff values ('2019-09-09T10:45:49+02:00'),('2019-07-24');
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183;
> set hive.ferch.task.conversion=none;
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22476) Hive datediff function provided inconsistent results when hive.fetch.task.conversion is set to none

2019-11-15 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22476:
--
Status: Open  (was: Patch Available)

> Hive datediff function provided inconsistent results when 
> hive.fetch.task.conversion is set to none
> ---
>
> Key: HIVE-22476
> URL: https://issues.apache.org/jira/browse/HIVE-22476
> Project: Hive
>  Issue Type: Bug
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22476.2.patch, HIVE-22476.3.patch, 
> HIVE-22476.5.patch
>
>
> The actual issue stems to the different date parser used by various part of 
> the engine.
> Fetch task uses udfdatediff via {code} 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFToDate{code} while the 
> vectorized llap execution uses {code}VectorUDFDateDiffScalarCol{code}.
> This fix is meant to be not very intrusive and will add more support to the 
> GenericUDFToDate by enhancing the parser.
> For the longer term will be better to use one parser for all the operators.
> Thanks [~Rajkumar Singh] for the repro example
> {code} 
> create external table testdatediff(datetimecol string) stored as orc;
> insert into testdatediff values ('2019-09-09T10:45:49+02:00'),('2019-07-24');
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183;
> set hive.ferch.task.conversion=none;
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22476) Hive datediff function provided inconsistent results when hive.fetch.task.conversion is set to none

2019-11-15 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22476:
--
Attachment: HIVE-22476.6.patch

> Hive datediff function provided inconsistent results when 
> hive.fetch.task.conversion is set to none
> ---
>
> Key: HIVE-22476
> URL: https://issues.apache.org/jira/browse/HIVE-22476
> Project: Hive
>  Issue Type: Bug
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22476.2.patch, HIVE-22476.3.patch, 
> HIVE-22476.5.patch, HIVE-22476.6.patch
>
>
> The actual issue stems to the different date parser used by various part of 
> the engine.
> Fetch task uses udfdatediff via {code} 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFToDate{code} while the 
> vectorized llap execution uses {code}VectorUDFDateDiffScalarCol{code}.
> This fix is meant to be not very intrusive and will add more support to the 
> GenericUDFToDate by enhancing the parser.
> For the longer term will be better to use one parser for all the operators.
> Thanks [~Rajkumar Singh] for the repro example
> {code} 
> create external table testdatediff(datetimecol string) stored as orc;
> insert into testdatediff values ('2019-09-09T10:45:49+02:00'),('2019-07-24');
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183;
> set hive.ferch.task.conversion=none;
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22476) Hive datediff function provided inconsistent results when hive.fetch.task.conversion is set to none

2019-11-15 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22476:
--
Status: Patch Available  (was: Open)

> Hive datediff function provided inconsistent results when 
> hive.fetch.task.conversion is set to none
> ---
>
> Key: HIVE-22476
> URL: https://issues.apache.org/jira/browse/HIVE-22476
> Project: Hive
>  Issue Type: Bug
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22476.2.patch, HIVE-22476.3.patch, 
> HIVE-22476.5.patch, HIVE-22476.6.patch
>
>
> The actual issue stems to the different date parser used by various part of 
> the engine.
> Fetch task uses udfdatediff via {code} 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFToDate{code} while the 
> vectorized llap execution uses {code}VectorUDFDateDiffScalarCol{code}.
> This fix is meant to be not very intrusive and will add more support to the 
> GenericUDFToDate by enhancing the parser.
> For the longer term will be better to use one parser for all the operators.
> Thanks [~Rajkumar Singh] for the repro example
> {code} 
> create external table testdatediff(datetimecol string) stored as orc;
> insert into testdatediff values ('2019-09-09T10:45:49+02:00'),('2019-07-24');
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183;
> set hive.ferch.task.conversion=none;
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22492) Amortize lock contention due to LRFU accounting

2019-11-15 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22492:
--
Attachment: HIVE-22492.2.patch

> Amortize lock contention due to LRFU accounting
> ---
>
> Key: HIVE-22492
> URL: https://issues.apache.org/jira/browse/HIVE-22492
> Project: Hive
>  Issue Type: Improvement
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22492.2.patch, HIVE-22492.patch, 
> llap-lock-contention.svg
>
>
> LRFU eviction policy can be a major source of contention under high load.
> This can be see on the following profiles.
> To fix this the idea is to use a batching wrapper to amortize the locking 
> contention.
> The trick i a common way to amortize locking as explained here 
> http://www.ece.eng.wayne.edu/~sjiang/pubs/papers/ding-09-BP-Wrapper.pdf



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22493) Scheduled Query Execution Failure in Tests

2019-11-15 Thread Slim Bouguerra (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16975354#comment-16975354
 ] 

Slim Bouguerra commented on HIVE-22493:
---

+1

> Scheduled Query Execution Failure in Tests
> --
>
> Key: HIVE-22493
> URL: https://issues.apache.org/jira/browse/HIVE-22493
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: Zoltan Haindrich
>Priority: Critical
> Attachments: HIVE-22493.01.patch
>
>
> {code:none}
> org.apache.hadoop.hive.schq.TestScheduledQueryIntegration.testScheduledQueryExecutionImpersonation
>  (batchId=279)
> org.apache.hive.jdbc.TestSSL.testMetastoreConnectionWrongCertCN (batchId=284)
> org.apache.hive.jdbc.TestSSL.testMetastoreWithSSL (batchId=284)
> {code}
> {code:none}
> 2019-11-12T18:11:00,181  INFO [pool-20-thread-10] HiveMetaStore.audit: 
> ugi=hiveptest  ip=127.0.0.1cmd=source:127.0.0.1 scheduled_query_poll  
>  
> 2019-11-12T18:11:00,182  INFO [pool-20-thread-10] metastore.HiveMetaStore: 
> 25: Opening raw store with implementation 
> class:org.apache.hadoop.hive.metastore.ObjectStore
> 2019-11-12T18:11:00,183  INFO [pool-20-thread-10] 
> metastore.PersistenceManagerProvider: Updating the pmf due to property change
> 2019-11-12T18:11:00,184 ERROR [pool-20-thread-10] metastore.HiveMetaStore: 
> Caught exception
> javax.jdo.JDOUserException: Cant close PersistenceManagerFactory while we 
> have active transactions.
>   at 
> org.datanucleus.api.jdo.JDOPersistenceManagerFactory.close(JDOPersistenceManagerFactory.java:603)
>  ~[datanucleus-api-jdo-4.2.4.jar:?]
>   at 
> org.apache.hadoop.hive.metastore.PersistenceManagerProvider.updatePmfProperties(PersistenceManagerProvider.java:199)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:213) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:77) 
> ~[hadoop-common-3.1.0.jar:?]
>   at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:137) 
> ~[hadoop-common-3.1.0.jar:?]
>   at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.(RawStoreProxy.java:59) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:67)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStoreForConf(HiveMetaStore.java:852)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMSForConf(HiveMetaStore.java:820)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:814)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.scheduled_query_poll(HiveMetaStore.java:9660)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.8.0_102]
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_102]
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_102]
>   at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_102]
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at com.sun.proxy.$Proxy46.scheduled_query_poll(Unknown Source) [?:?]
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$scheduled_query_poll.getResult(ThriftHiveMetastore.java:21561)
>  [hive-standalone-metastore-common-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$scheduled_query_poll.getResult(ThriftHiveMetastore.java:21545)
>  [hive-standalone-metastore-common-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) 
> [libthrift-0.9.3-1.jar:0.9.3-1]
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:111)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:107)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at java.security.AccessController.doPrivileged(Native Method) 
> ~[?:1.8.0_102]
>   at 

[jira] [Updated] (HIVE-22476) Hive datediff function provided inconsistent results when hive.fetch.task.conversion is set to none

2019-11-18 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22476:
--
Attachment: HIVE-22476.7.patch

> Hive datediff function provided inconsistent results when 
> hive.fetch.task.conversion is set to none
> ---
>
> Key: HIVE-22476
> URL: https://issues.apache.org/jira/browse/HIVE-22476
> Project: Hive
>  Issue Type: Bug
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22476.2.patch, HIVE-22476.3.patch, 
> HIVE-22476.5.patch, HIVE-22476.6.patch, HIVE-22476.7.patch
>
>
> The actual issue stems to the different date parser used by various part of 
> the engine.
> Fetch task uses udfdatediff via {code} 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFToDate{code} while the 
> vectorized llap execution uses {code}VectorUDFDateDiffScalarCol{code}.
> This fix is meant to be not very intrusive and will add more support to the 
> GenericUDFToDate by enhancing the parser.
> For the longer term will be better to use one parser for all the operators.
> Thanks [~Rajkumar Singh] for the repro example
> {code} 
> create external table testdatediff(datetimecol string) stored as orc;
> insert into testdatediff values ('2019-09-09T10:45:49+02:00'),('2019-07-24');
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183;
> set hive.ferch.task.conversion=none;
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Issue Comment Deleted] (HIVE-22492) Amortize lock contention due to LRFU accounting

2019-11-18 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22492:
--
Comment: was deleted

(was: 
https://git-wip-us.apache.org/repos/asf?p=hive.git;a=commit;h=0781cf2c5104dafd0c5496631cafabac9d59df67)

> Amortize lock contention due to LRFU accounting
> ---
>
> Key: HIVE-22492
> URL: https://issues.apache.org/jira/browse/HIVE-22492
> Project: Hive
>  Issue Type: Improvement
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22492.2.patch, HIVE-22492.patch, 
> llap-lock-contention.svg
>
>
> LRFU eviction policy can be a major source of contention under high load.
> This can be see on the following profiles.
> To fix this the idea is to use a batching wrapper to amortize the locking 
> contention.
> The trick i a common way to amortize locking as explained here 
> http://www.ece.eng.wayne.edu/~sjiang/pubs/papers/ding-09-BP-Wrapper.pdf



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22492) Amortize lock contention due to LRFU accounting

2019-11-18 Thread Slim Bouguerra (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976772#comment-16976772
 ] 

Slim Bouguerra commented on HIVE-22492:
---

https://git-wip-us.apache.org/repos/asf?p=hive.git;a=commit;h=0781cf2c5104dafd0c5496631cafabac9d59df67

> Amortize lock contention due to LRFU accounting
> ---
>
> Key: HIVE-22492
> URL: https://issues.apache.org/jira/browse/HIVE-22492
> Project: Hive
>  Issue Type: Improvement
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22492.2.patch, HIVE-22492.patch, 
> llap-lock-contention.svg
>
>
> LRFU eviction policy can be a major source of contention under high load.
> This can be see on the following profiles.
> To fix this the idea is to use a batching wrapper to amortize the locking 
> contention.
> The trick i a common way to amortize locking as explained here 
> http://www.ece.eng.wayne.edu/~sjiang/pubs/papers/ding-09-BP-Wrapper.pdf



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22492) Amortize lock contention due to LRFU accounting

2019-11-18 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22492:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

https://git-wip-us.apache.org/repos/asf?p=hive.git;a=commit;h=0781cf2c5104dafd0c5496631cafabac9d59df67

> Amortize lock contention due to LRFU accounting
> ---
>
> Key: HIVE-22492
> URL: https://issues.apache.org/jira/browse/HIVE-22492
> Project: Hive
>  Issue Type: Improvement
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22492.2.patch, HIVE-22492.patch, 
> llap-lock-contention.svg
>
>
> LRFU eviction policy can be a major source of contention under high load.
> This can be see on the following profiles.
> To fix this the idea is to use a batching wrapper to amortize the locking 
> contention.
> The trick i a common way to amortize locking as explained here 
> http://www.ece.eng.wayne.edu/~sjiang/pubs/papers/ding-09-BP-Wrapper.pdf



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22523) The error handler in LlapRecordReader might block if its queue is full

2019-11-22 Thread Slim Bouguerra (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16980380#comment-16980380
 ] 

Slim Bouguerra commented on HIVE-22523:
---

[~amagyar] then please leave the {code} enqueueInternal {code} call as is since 
it is by design not blocking and it is not good idea to remove it if there is 
not reason.
As i said i do not think this is the issue why we get the OOM.

> The error handler in LlapRecordReader might block if its queue is full
> --
>
> Key: HIVE-22523
> URL: https://issues.apache.org/jira/browse/HIVE-22523
> Project: Hive
>  Issue Type: Bug
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-22523.1.patch
>
>
> In setError() we set the value of an atomic reference (pendingError) and we 
> also put the error in a queue. The latter seems not just unnecessary but it 
> might block the caller of the handler if the queue is full. Also closing of 
> the reader is might not properly handled as some of the flags are not 
> volatile.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22523) The error handler in LlapRecordReader might block if its queue is full

2019-11-21 Thread Slim Bouguerra (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16979516#comment-16979516
 ] 

Slim Bouguerra commented on HIVE-22523:
---

[~amagyar] {code} 
org.apache.hadoop.hive.llap.io.api.impl.LlapRecordReader#enqueueInternal{code} 
is not blocking can you please explain more what it the issue ? is it variable 
reads visibility issue ? 

> The error handler in LlapRecordReader might block if its queue is full
> --
>
> Key: HIVE-22523
> URL: https://issues.apache.org/jira/browse/HIVE-22523
> Project: Hive
>  Issue Type: Bug
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-22523.1.patch
>
>
> In setError() we set the value of an atomic reference (pendingError) and we 
> also put the error in a queue. The latter seems not just unnecessary but it 
> might block the caller of the handler if the queue is full. Also closing of 
> the reader is might not properly handled as some of the flags are not 
> volatile.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22476) Hive datediff function provided inconsistent results when hive.fetch.task.conversion is set to none

2019-11-21 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22476:
--
Attachment: HIVE-22476.8.patch

> Hive datediff function provided inconsistent results when 
> hive.fetch.task.conversion is set to none
> ---
>
> Key: HIVE-22476
> URL: https://issues.apache.org/jira/browse/HIVE-22476
> Project: Hive
>  Issue Type: Bug
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22476.2.patch, HIVE-22476.3.patch, 
> HIVE-22476.5.patch, HIVE-22476.6.patch, HIVE-22476.7.patch, 
> HIVE-22476.7.patch, HIVE-22476.8.patch
>
>
> The actual issue stems to the different date parser used by various part of 
> the engine.
> Fetch task uses udfdatediff via {code} 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFToDate{code} while the 
> vectorized llap execution uses {code}VectorUDFDateDiffScalarCol{code}.
> This fix is meant to be not very intrusive and will add more support to the 
> GenericUDFToDate by enhancing the parser.
> For the longer term will be better to use one parser for all the operators.
> Thanks [~Rajkumar Singh] for the repro example
> {code} 
> create external table testdatediff(datetimecol string) stored as orc;
> insert into testdatediff values ('2019-09-09T10:45:49+02:00'),('2019-07-24');
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183;
> set hive.ferch.task.conversion=none;
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22523) The error handler in LlapRecordReader might block if its queue is full

2019-11-21 Thread Slim Bouguerra (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16979550#comment-16979550
 ] 

Slim Bouguerra commented on HIVE-22523:
---

as per the code will wait for 100ms then next round should exit if one of the 
flags are set.
{code} 
 private void enqueueInternal(Object o) throws InterruptedException {
// We need to loop here to handle the case where consumer goes away.
do {} while (!isClosed && !isInterrupted && !queue.offer(o, 100, 
TimeUnit.MILLISECONDS));
  }
{code}

are you saying that in some cases the flags are not set or it is not visible to 
the thread ?

> The error handler in LlapRecordReader might block if its queue is full
> --
>
> Key: HIVE-22523
> URL: https://issues.apache.org/jira/browse/HIVE-22523
> Project: Hive
>  Issue Type: Bug
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-22523.1.patch
>
>
> In setError() we set the value of an atomic reference (pendingError) and we 
> also put the error in a queue. The latter seems not just unnecessary but it 
> might block the caller of the handler if the queue is full. Also closing of 
> the reader is might not properly handled as some of the flags are not 
> volatile.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22558) Metastore: Passwords jceks should be read lazily, in case of connection pools

2019-12-10 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22558:
--
Status: Patch Available  (was: Open)

> Metastore: Passwords jceks should be read lazily, in case of connection pools
> -
>
> Key: HIVE-22558
> URL: https://issues.apache.org/jira/browse/HIVE-22558
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Standalone Metastore
>Reporter: Gopal Vijayaraghavan
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: getDatabase-password-md5-hotpath.png
>
>
> The jceks file is parsed for every instance of the metastore conf to populate 
> the password in plain-text, which is irrelevant for the scenario where the DB 
> connection pool is already active.
>   !getDatabase-password-md5-hotpath.png|width=640!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


<    6   7   8   9   10   11   12   >