[jira] [Updated] (HIVE-11692) Fix UT regressions on hbase-metastore branch

2015-08-30 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-11692:
--
Attachment: HIVE-11692.1-hbase-metastore.patch

> Fix UT regressions on hbase-metastore branch
> 
>
> Key: HIVE-11692
> URL: https://issues.apache.org/jira/browse/HIVE-11692
> Project: Hive
>  Issue Type: Sub-task
>  Components: HBase Metastore
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: hbase-metastore-branch
>
> Attachments: HIVE-11692.1-hbase-metastore.patch
>
>
> There are several unit test regressions on hbase-metastore:
> TestWebHCatE2e (asm-5.0.jar conflict with jersey)
> TestHBaseImport (leave some objects behind causing other tests fail)
> TestMiniHBaseMetastoreCliDriver (the test itself shall not exist)
> TestCliDriver(dynpart_sort_opt_vectorization.q,dynpart_sort_optimization.q)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11662) DP cannot be applied to external table which contains part-spec like directory

2015-08-30 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14723042#comment-14723042
 ] 

Navis commented on HIVE-11662:
--

Failures seemed not related to this. I'll add some test cases.

> DP cannot be applied to external table which contains part-spec like directory
> --
>
> Key: HIVE-11662
> URL: https://issues.apache.org/jira/browse/HIVE-11662
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Trivial
> Attachments: HIVE-11662.1.patch.txt
>
>
> Some users want to use part-spec like directory name in their partitioned 
> table locations, something like,
> {noformat}
> /something/warehouse/some_key=some_value
> {noformat}
> DP calculates additional partitions from full path, and makes exception 
> something like,
> {noformat}
> Failed with exception Partition spec {some_key=some_value, 
> part_key=part_value} contains non-partition columns
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.MoveTask
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11609) Capability to add a filter to hbase scan via composite key doesn't work

2015-08-30 Thread Swarnim Kulkarni (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14722971#comment-14722971
 ] 

Swarnim Kulkarni commented on HIVE-11609:
-

RB request updated with latest patch.

> Capability to add a filter to hbase scan via composite key doesn't work
> ---
>
> Key: HIVE-11609
> URL: https://issues.apache.org/jira/browse/HIVE-11609
> Project: Hive
>  Issue Type: Bug
>  Components: HBase Handler
>Reporter: Swarnim Kulkarni
>Assignee: Swarnim Kulkarni
> Attachments: HIVE-11609.1.patch.txt, HIVE-11609.2.patch.txt
>
>
> It seems like the capability to add filter to an hbase scan which was added 
> as part of HIVE-6411 doesn't work. This is primarily because in the 
> HiveHBaseInputFormat, the filter is added in the getsplits instead of 
> getrecordreader. This works fine for start and stop keys but not for filter 
> because a filter is respected only when an actual scan is performed. This is 
> also related to the initial refactoring that was done as part of HIVE-3420.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11609) Capability to add a filter to hbase scan via composite key doesn't work

2015-08-30 Thread Swarnim Kulkarni (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14722968#comment-14722968
 ] 

Swarnim Kulkarni commented on HIVE-11609:
-

Some more info on my test environment:

1. My keys were salted which means no hot regions and even key distribution 
across regions.
2. No pre-splits when loading data. Though I feel the performance would have 
been even better if the regions were pre-split.
3. My CompositeKeyFactory implementation would take the successive 
predicates(key1="something" and key2="something2") and convert them to apt scan 
filter. So results on those types of queries completely depends on how you are 
handing the predicate to filter conversion logic inside your custom 
implementation. 

> Capability to add a filter to hbase scan via composite key doesn't work
> ---
>
> Key: HIVE-11609
> URL: https://issues.apache.org/jira/browse/HIVE-11609
> Project: Hive
>  Issue Type: Bug
>  Components: HBase Handler
>Reporter: Swarnim Kulkarni
>Assignee: Swarnim Kulkarni
> Attachments: HIVE-11609.1.patch.txt, HIVE-11609.2.patch.txt
>
>
> It seems like the capability to add filter to an hbase scan which was added 
> as part of HIVE-6411 doesn't work. This is primarily because in the 
> HiveHBaseInputFormat, the filter is added in the getsplits instead of 
> getrecordreader. This works fine for start and stop keys but not for filter 
> because a filter is respected only when an actual scan is performed. This is 
> also related to the initial refactoring that was done as part of HIVE-3420.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11609) Capability to add a filter to hbase scan via composite key doesn't work

2015-08-30 Thread Swarnim Kulkarni (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Swarnim Kulkarni updated HIVE-11609:

Attachment: HIVE-11609.2.patch.txt

Updating patch to address the failing test.

> Capability to add a filter to hbase scan via composite key doesn't work
> ---
>
> Key: HIVE-11609
> URL: https://issues.apache.org/jira/browse/HIVE-11609
> Project: Hive
>  Issue Type: Bug
>  Components: HBase Handler
>Reporter: Swarnim Kulkarni
>Assignee: Swarnim Kulkarni
> Attachments: HIVE-11609.1.patch.txt, HIVE-11609.2.patch.txt
>
>
> It seems like the capability to add filter to an hbase scan which was added 
> as part of HIVE-6411 doesn't work. This is primarily because in the 
> HiveHBaseInputFormat, the filter is added in the getsplits instead of 
> getrecordreader. This works fine for start and stop keys but not for filter 
> because a filter is respected only when an actual scan is performed. This is 
> also related to the initial refactoring that was done as part of HIVE-3420.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11671) Optimize RuleRegExp in DPP codepath

2015-08-30 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14721928#comment-14721928
 ] 

Rajesh Balamohan commented on HIVE-11671:
-

Thanks [~hsubramaniyan]. Not sure why the builds are not yet triggered. Should 
I cancel and hit resubmit to re-trigger the build?

> Optimize RuleRegExp in DPP codepath
> ---
>
> Key: HIVE-11671
> URL: https://issues.apache.org/jira/browse/HIVE-11671
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
> Attachments: HIVE-11671.1.patch, cpu_with_patch.png, 
> cpu_without_patch.png, mem_with_patch.png, mem_without_patch.png
>
>
> When running a large query with DPP in its codepath, RuleRegExp came up as 
> hotspot. Creating this JIRA to optimize RuleRegExp.java.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11482) Add retrying thrift client for HiveServer2

2015-08-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14721800#comment-14721800
 ] 

Hive QA commented on HIVE-11482:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12753210/HIVE-11482.01.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9382 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_unionall_unbalancedppd
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5119/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5119/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5119/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12753210 - PreCommit-HIVE-TRUNK-Build

> Add retrying thrift client for HiveServer2
> --
>
> Key: HIVE-11482
> URL: https://issues.apache.org/jira/browse/HIVE-11482
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Amareshwari Sriramadasu
>Assignee: Akshay Goyal
> Attachments: HIVE-11482.01.patch
>
>
> Similar to 
> https://github.com/apache/hive/blob/master/metastore/src/java/org/apache/hadoop/hive/metastore/RetryingMetaStoreClient.java,
>  this improvement request is to add a retrying thrift client for HiveServer2 
> to do retries upon thrift exceptions.
> Here are few commits done on a forked branch that can be picked - 
> https://github.com/InMobi/hive/commit/7fb957fb9c2b6000d37c53294e256460010cb6b7
> https://github.com/InMobi/hive/commit/11e4b330f051c3f58927a276d562446761c9cd6d
> https://github.com/InMobi/hive/commit/241386fd870373a9253dca0bcbdd4ea7e665406c



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11663) Auto load/unload custom udf function for hive cli and hiveserver2

2015-08-30 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14721766#comment-14721766
 ] 

Lefty Leverenz commented on HIVE-11663:
---

Thanks for the changes, I'll post more review comments soon.  But you need code 
review more than doc review, and I'm not qualified to review code.  Perhaps if 
you posted the patch on the review board you would get more responses.

* [Review Board | https://cwiki.apache.org/confluence/display/Hive/Review+Board]

If you have any trouble with the review board, I'll gladly help.  (I had 
trouble the first time, and finally discovered that I had to use a Firefox 
browser instead of Safari to get it to work.)

> Auto load/unload custom udf function for hive cli and hiveserver2
> -
>
> Key: HIVE-11663
> URL: https://issues.apache.org/jira/browse/HIVE-11663
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI, Configuration
>Affects Versions: 0.14.0, 1.0.0, 1.0.1, 1.1.1, 1.2.1
>Reporter: liuzongquan
>Assignee: liuzongquan
>  Labels: features, patch
> Attachments: HIVE-11663-2.patch
>
>   Original Estimate: 96h
>  Time Spent: 96h
>  Remaining Estimate: 0h
>
> when adding custom functions used in hiveserver2, the most method is re-build 
> the hive source code, re-dist and restart hiveserver2. This way will produce 
> big cost for service user and cluster manager. The best way, in my opinion, 
> the custom udf should be like a plugin to the hiveserver2 and hive cli, and  
> users can add and remove at run-time, especially for hiveserver2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10459) Add materialized views to Hive

2015-08-30 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-10459:
--
Component/s: Views

> Add materialized views to Hive
> --
>
> Key: HIVE-10459
> URL: https://issues.apache.org/jira/browse/HIVE-10459
> Project: Hive
>  Issue Type: Improvement
>  Components: Views
>Reporter: Alan Gates
>Assignee: Alan Gates
> Attachments: HIVE-10459.2.patch, HIVE-10459.patch
>
>
> Materialized views are useful as ways to store either alternate versions of 
> data (e.g. same data, different sort order) or derivatives of data sets (e.g. 
> commonly used aggregates).  It is useful to store these as materialized views 
> rather than as tables because it can give the optimizer the ability to 
> understand how data sets are related.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11482) Add retrying thrift client for HiveServer2

2015-08-30 Thread Akshay Goyal (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14721711#comment-14721711
 ] 

Akshay Goyal commented on HIVE-11482:
-

Created review request (https://reviews.apache.org/r/37935/) and attached the 
patch.

> Add retrying thrift client for HiveServer2
> --
>
> Key: HIVE-11482
> URL: https://issues.apache.org/jira/browse/HIVE-11482
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Amareshwari Sriramadasu
>Assignee: Akshay Goyal
> Attachments: HIVE-11482.01.patch
>
>
> Similar to 
> https://github.com/apache/hive/blob/master/metastore/src/java/org/apache/hadoop/hive/metastore/RetryingMetaStoreClient.java,
>  this improvement request is to add a retrying thrift client for HiveServer2 
> to do retries upon thrift exceptions.
> Here are few commits done on a forked branch that can be picked - 
> https://github.com/InMobi/hive/commit/7fb957fb9c2b6000d37c53294e256460010cb6b7
> https://github.com/InMobi/hive/commit/11e4b330f051c3f58927a276d562446761c9cd6d
> https://github.com/InMobi/hive/commit/241386fd870373a9253dca0bcbdd4ea7e665406c



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11482) Add retrying thrift client for HiveServer2

2015-08-30 Thread Akshay Goyal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akshay Goyal updated HIVE-11482:

Attachment: HIVE-11482.01.patch

> Add retrying thrift client for HiveServer2
> --
>
> Key: HIVE-11482
> URL: https://issues.apache.org/jira/browse/HIVE-11482
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Amareshwari Sriramadasu
>Assignee: Akshay Goyal
> Attachments: HIVE-11482.01.patch
>
>
> Similar to 
> https://github.com/apache/hive/blob/master/metastore/src/java/org/apache/hadoop/hive/metastore/RetryingMetaStoreClient.java,
>  this improvement request is to add a retrying thrift client for HiveServer2 
> to do retries upon thrift exceptions.
> Here are few commits done on a forked branch that can be picked - 
> https://github.com/InMobi/hive/commit/7fb957fb9c2b6000d37c53294e256460010cb6b7
> https://github.com/InMobi/hive/commit/11e4b330f051c3f58927a276d562446761c9cd6d
> https://github.com/InMobi/hive/commit/241386fd870373a9253dca0bcbdd4ea7e665406c



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11587) Fix memory estimates for mapjoin hashtable

2015-08-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14721683#comment-14721683
 ] 

Hive QA commented on HIVE-11587:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12753199/HIVE-11587.04.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9381 tests executed
*Failed tests:*
{noformat}
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5118/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5118/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5118/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12753199 - PreCommit-HIVE-TRUNK-Build

> Fix memory estimates for mapjoin hashtable
> --
>
> Key: HIVE-11587
> URL: https://issues.apache.org/jira/browse/HIVE-11587
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Wei Zheng
> Attachments: HIVE-11587.01.patch, HIVE-11587.02.patch, 
> HIVE-11587.03.patch, HIVE-11587.04.patch
>
>
> Due to the legacy in in-memory mapjoin and conservative planning, the memory 
> estimation code for mapjoin hashtable is currently not very good. It 
> allocates the probe erring on the side of more memory, not taking data into 
> account because unlike the probe, it's free to resize, so it's better for 
> perf to allocate big probe and hope for the best with regard to future data 
> size. It is not true for hybrid case.
> There's code to cap the initial allocation based on memory available 
> (memUsage argument), but due to some code rot, the memory estimates from 
> planning are not even passed to hashtable anymore (there used to be two 
> config settings, hashjoin size fraction by itself, or hashjoin size fraction 
> for group by case), so it never caps the memory anymore below 1 Gb. 
> Initial capacity is estimated from input key count, and in hybrid join cache 
> can exceed Java memory due to number of segments.
> There needs to be a review and fix of all this code.
> Suggested improvements:
> 1) Make sure "initialCapacity" argument from Hybrid case is correct given the 
> number of segments. See how it's calculated from keys for regular case; it 
> needs to be adjusted accordingly for hybrid case if not done already.
> 1.5) Note that, knowing the number of rows, the maximum capacity one will 
> ever need for probe size (in longs) is row count (assuming key per row, i.e. 
> maximum possible number of keys) divided by load factor, plus some very small 
> number to round up. That is for flat case. For hybrid case it may be more 
> complex due to skew, but that is still a good upper bound for the total probe 
> capacity of all segments.
> 2) Rename memUsage to maxProbeSize, or something, make sure it's passed 
> correctly based on estimates that take into account both probe and data size, 
> esp. in hybrid case.
> 3) Make sure that memory estimation for hybrid case also doesn't come up with 
> numbers that are too small, like 1-byte hashtable. I am not very familiar 
> with that code but it has happened in the past.
> Other issues we have seen:
> 4) Cap single write buffer size to 8-16Mb. The whole point of WBs is that you 
> should not allocate large array in advance. Even if some estimate passes 
> 500Mb or 40Mb or whatever, it doesn't make sense to allocate that.
> 5) For hybrid, don't pre-allocate WBs - only allocate on write.
> 6) Change everywhere rounding up to power of two is used to rounding down, at 
> least for hybrid case (?)
> I wanted to put all of these items in single JIRA so we could keep track of 
> fixing all of them.
> I think there are JIRAs for some of these already, feel free to link them to 
> this one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11689) minor flow changes to ORC split generation

2015-08-30 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14721621#comment-14721621
 ] 

Prasanth Jayachandran commented on HIVE-11689:
--

LGTM, +1

> minor flow changes to ORC split generation
> --
>
> Key: HIVE-11689
> URL: https://issues.apache.org/jira/browse/HIVE-11689
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11689.patch
>
>
> There are two changes that would help future work on split PPD into HBase 
> metastore. 
> 1) Move non-HDFS split strategy determination logic into main thread from 
> threadpool.
> 2) Instead of iterating thru the futures and waiting, use CompletionService 
> to get futures in order of completion. That might be useful by itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11587) Fix memory estimates for mapjoin hashtable

2015-08-30 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-11587:
-
Attachment: HIVE-11587.04.patch

Upload patch 4. Added empty write buffer check for seal method.

> Fix memory estimates for mapjoin hashtable
> --
>
> Key: HIVE-11587
> URL: https://issues.apache.org/jira/browse/HIVE-11587
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Wei Zheng
> Attachments: HIVE-11587.01.patch, HIVE-11587.02.patch, 
> HIVE-11587.03.patch, HIVE-11587.04.patch
>
>
> Due to the legacy in in-memory mapjoin and conservative planning, the memory 
> estimation code for mapjoin hashtable is currently not very good. It 
> allocates the probe erring on the side of more memory, not taking data into 
> account because unlike the probe, it's free to resize, so it's better for 
> perf to allocate big probe and hope for the best with regard to future data 
> size. It is not true for hybrid case.
> There's code to cap the initial allocation based on memory available 
> (memUsage argument), but due to some code rot, the memory estimates from 
> planning are not even passed to hashtable anymore (there used to be two 
> config settings, hashjoin size fraction by itself, or hashjoin size fraction 
> for group by case), so it never caps the memory anymore below 1 Gb. 
> Initial capacity is estimated from input key count, and in hybrid join cache 
> can exceed Java memory due to number of segments.
> There needs to be a review and fix of all this code.
> Suggested improvements:
> 1) Make sure "initialCapacity" argument from Hybrid case is correct given the 
> number of segments. See how it's calculated from keys for regular case; it 
> needs to be adjusted accordingly for hybrid case if not done already.
> 1.5) Note that, knowing the number of rows, the maximum capacity one will 
> ever need for probe size (in longs) is row count (assuming key per row, i.e. 
> maximum possible number of keys) divided by load factor, plus some very small 
> number to round up. That is for flat case. For hybrid case it may be more 
> complex due to skew, but that is still a good upper bound for the total probe 
> capacity of all segments.
> 2) Rename memUsage to maxProbeSize, or something, make sure it's passed 
> correctly based on estimates that take into account both probe and data size, 
> esp. in hybrid case.
> 3) Make sure that memory estimation for hybrid case also doesn't come up with 
> numbers that are too small, like 1-byte hashtable. I am not very familiar 
> with that code but it has happened in the past.
> Other issues we have seen:
> 4) Cap single write buffer size to 8-16Mb. The whole point of WBs is that you 
> should not allocate large array in advance. Even if some estimate passes 
> 500Mb or 40Mb or whatever, it doesn't make sense to allocate that.
> 5) For hybrid, don't pre-allocate WBs - only allocate on write.
> 6) Change everywhere rounding up to power of two is used to rounding down, at 
> least for hybrid case (?)
> I wanted to put all of these items in single JIRA so we could keep track of 
> fixing all of them.
> I think there are JIRAs for some of these already, feel free to link them to 
> this one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11609) Capability to add a filter to hbase scan via composite key doesn't work

2015-08-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14721438#comment-14721438
 ] 

Hive QA commented on HIVE-11609:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12753168/HIVE-11609.1.patch.txt

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9380 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_custom_key3
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5117/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5117/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5117/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12753168 - PreCommit-HIVE-TRUNK-Build

> Capability to add a filter to hbase scan via composite key doesn't work
> ---
>
> Key: HIVE-11609
> URL: https://issues.apache.org/jira/browse/HIVE-11609
> Project: Hive
>  Issue Type: Bug
>  Components: HBase Handler
>Reporter: Swarnim Kulkarni
>Assignee: Swarnim Kulkarni
> Attachments: HIVE-11609.1.patch.txt
>
>
> It seems like the capability to add filter to an hbase scan which was added 
> as part of HIVE-6411 doesn't work. This is primarily because in the 
> HiveHBaseInputFormat, the filter is added in the getsplits instead of 
> getrecordreader. This works fine for start and stop keys but not for filter 
> because a filter is respected only when an actual scan is performed. This is 
> also related to the initial refactoring that was done as part of HIVE-3420.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)