[
https://issues.apache.org/jira/browse/HIVE-4957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13801560#comment-13801560
]
Hudson commented on HIVE-4957:
------------------------------
FAILURE: Integrated in Hive-trunk-hadoop2-ptest #147 (See
[https://builds.apache.org/job/Hive-trunk-hadoop2-ptest/147/])
HIVE-4957 - Restrict number of bit vectors, to prevent out of Java heap memory
(Shreepadma Venugopalan via Brock Noland) (brock:
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1534337)
*
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java
* /hive/trunk/ql/src/test/queries/clientnegative/compute_stats_long.q
* /hive/trunk/ql/src/test/results/clientnegative/compute_stats_long.q.out
> Restrict number of bit vectors, to prevent out of Java heap memory
> ------------------------------------------------------------------
>
> Key: HIVE-4957
> URL: https://issues.apache.org/jira/browse/HIVE-4957
> Project: Hive
> Issue Type: Bug
> Affects Versions: 0.11.0
> Reporter: Brock Noland
> Assignee: Shreepadma Venugopalan
> Fix For: 0.13.0
>
> Attachments: HIVE-4957.1.patch, HIVE-4957.2.patch
>
>
> normally increase number of bit vectors will increase calculation accuracy.
> Let's say
> {noformat}
> select compute_stats(a, 40) from test_hive;
> {noformat}
> generally get better accuracy than
> {noformat}
> select compute_stats(a, 16) from test_hive;
> {noformat}
> But larger number of bit vectors also cause query run slower. When number of
> bit vectors over 50, it won't help to increase accuracy anymore. But it still
> increase memory usage, and crash Hive if number if too huge. Current Hive
> doesn't prevent user use ridiculous large number of bit vectors in
> 'compute_stats' query.
> One example
> {noformat}
> select compute_stats(a, 999999999) from column_eight_types;
> {noformat}
> crashes Hive.
> {noformat}
> 2012-12-20 23:21:52,247 Stage-1 map = 0%, reduce = 0%
> 2012-12-20 23:22:11,315 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.29
> sec
> MapReduce Total cumulative CPU time: 290 msec
> Ended Job = job_1354923204155_0777 with errors
> Error during job, obtaining debugging information...
> Job Tracking URL:
> http://cs-10-20-81-171.cloud.cloudera.com:8088/proxy/application_1354923204155_0777/
> Examining task ID: task_1354923204155_0777_m_000000 (and more) from job
> job_1354923204155_0777
> Task with the most failures(4):
> -----
> Task ID:
> task_1354923204155_0777_m_000000
> URL:
>
> http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1354923204155_0777&tipid=task_1354923204155_0777_m_000000
> -----
> Diagnostic Messages for this Task:
> Error: Java heap space
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.1#6144)