[jira] [Commented] (HIVE-4957) Restrict number of bit vectors, to prevent out of Java heap memory

2013-10-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13801560#comment-13801560
 ] 

Hudson commented on HIVE-4957:
--

FAILURE: Integrated in Hive-trunk-hadoop2-ptest #147 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2-ptest/147/])
HIVE-4957 - Restrict number of bit vectors, to prevent out of Java heap memory 
(Shreepadma Venugopalan via Brock Noland) (brock: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1534337)
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java
* /hive/trunk/ql/src/test/queries/clientnegative/compute_stats_long.q
* /hive/trunk/ql/src/test/results/clientnegative/compute_stats_long.q.out


 Restrict number of bit vectors, to prevent out of Java heap memory
 --

 Key: HIVE-4957
 URL: https://issues.apache.org/jira/browse/HIVE-4957
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0
Reporter: Brock Noland
Assignee: Shreepadma Venugopalan
 Fix For: 0.13.0

 Attachments: HIVE-4957.1.patch, HIVE-4957.2.patch


 normally increase number of bit vectors will increase calculation accuracy. 
 Let's say
 {noformat}
 select compute_stats(a, 40) from test_hive;
 {noformat}
 generally get better accuracy than
 {noformat}
 select compute_stats(a, 16) from test_hive;
 {noformat}
 But larger number of bit vectors also cause query run slower. When number of 
 bit vectors over 50, it won't help to increase accuracy anymore. But it still 
 increase memory usage, and crash Hive if number if too huge. Current Hive 
 doesn't prevent user use ridiculous large number of bit vectors in 
 'compute_stats' query.
 One example
 {noformat}
 select compute_stats(a, 9) from column_eight_types;
 {noformat}
 crashes Hive.
 {noformat}
 2012-12-20 23:21:52,247 Stage-1 map = 0%,  reduce = 0%
 2012-12-20 23:22:11,315 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.29 
 sec
 MapReduce Total cumulative CPU time: 290 msec
 Ended Job = job_1354923204155_0777 with errors
 Error during job, obtaining debugging information...
 Job Tracking URL: 
 http://cs-10-20-81-171.cloud.cloudera.com:8088/proxy/application_1354923204155_0777/
 Examining task ID: task_1354923204155_0777_m_00 (and more) from job 
 job_1354923204155_0777
 Task with the most failures(4): 
 -
 Task ID:
   task_1354923204155_0777_m_00
 URL:
   
 http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1354923204155_0777tipid=task_1354923204155_0777_m_00
 -
 Diagnostic Messages for this Task:
 Error: Java heap space
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-4957) Restrict number of bit vectors, to prevent out of Java heap memory

2013-10-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13801748#comment-13801748
 ] 

Hudson commented on HIVE-4957:
--

FAILURE: Integrated in Hive-trunk-h0.21 #2413 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/2413/])
HIVE-4957 - Restrict number of bit vectors, to prevent out of Java heap memory 
(Shreepadma Venugopalan via Brock Noland) (brock: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1534337)
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java
* /hive/trunk/ql/src/test/queries/clientnegative/compute_stats_long.q
* /hive/trunk/ql/src/test/results/clientnegative/compute_stats_long.q.out


 Restrict number of bit vectors, to prevent out of Java heap memory
 --

 Key: HIVE-4957
 URL: https://issues.apache.org/jira/browse/HIVE-4957
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0
Reporter: Brock Noland
Assignee: Shreepadma Venugopalan
 Fix For: 0.13.0

 Attachments: HIVE-4957.1.patch, HIVE-4957.2.patch


 normally increase number of bit vectors will increase calculation accuracy. 
 Let's say
 {noformat}
 select compute_stats(a, 40) from test_hive;
 {noformat}
 generally get better accuracy than
 {noformat}
 select compute_stats(a, 16) from test_hive;
 {noformat}
 But larger number of bit vectors also cause query run slower. When number of 
 bit vectors over 50, it won't help to increase accuracy anymore. But it still 
 increase memory usage, and crash Hive if number if too huge. Current Hive 
 doesn't prevent user use ridiculous large number of bit vectors in 
 'compute_stats' query.
 One example
 {noformat}
 select compute_stats(a, 9) from column_eight_types;
 {noformat}
 crashes Hive.
 {noformat}
 2012-12-20 23:21:52,247 Stage-1 map = 0%,  reduce = 0%
 2012-12-20 23:22:11,315 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.29 
 sec
 MapReduce Total cumulative CPU time: 290 msec
 Ended Job = job_1354923204155_0777 with errors
 Error during job, obtaining debugging information...
 Job Tracking URL: 
 http://cs-10-20-81-171.cloud.cloudera.com:8088/proxy/application_1354923204155_0777/
 Examining task ID: task_1354923204155_0777_m_00 (and more) from job 
 job_1354923204155_0777
 Task with the most failures(4): 
 -
 Task ID:
   task_1354923204155_0777_m_00
 URL:
   
 http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1354923204155_0777tipid=task_1354923204155_0777_m_00
 -
 Diagnostic Messages for this Task:
 Error: Java heap space
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-4957) Restrict number of bit vectors, to prevent out of Java heap memory

2013-10-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13801766#comment-13801766
 ] 

Hudson commented on HIVE-4957:
--

ABORTED: Integrated in Hive-trunk-hadoop2 #515 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/515/])
HIVE-4957 - Restrict number of bit vectors, to prevent out of Java heap memory 
(Shreepadma Venugopalan via Brock Noland) (brock: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1534337)
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java
* /hive/trunk/ql/src/test/queries/clientnegative/compute_stats_long.q
* /hive/trunk/ql/src/test/results/clientnegative/compute_stats_long.q.out


 Restrict number of bit vectors, to prevent out of Java heap memory
 --

 Key: HIVE-4957
 URL: https://issues.apache.org/jira/browse/HIVE-4957
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0
Reporter: Brock Noland
Assignee: Shreepadma Venugopalan
 Fix For: 0.13.0

 Attachments: HIVE-4957.1.patch, HIVE-4957.2.patch


 normally increase number of bit vectors will increase calculation accuracy. 
 Let's say
 {noformat}
 select compute_stats(a, 40) from test_hive;
 {noformat}
 generally get better accuracy than
 {noformat}
 select compute_stats(a, 16) from test_hive;
 {noformat}
 But larger number of bit vectors also cause query run slower. When number of 
 bit vectors over 50, it won't help to increase accuracy anymore. But it still 
 increase memory usage, and crash Hive if number if too huge. Current Hive 
 doesn't prevent user use ridiculous large number of bit vectors in 
 'compute_stats' query.
 One example
 {noformat}
 select compute_stats(a, 9) from column_eight_types;
 {noformat}
 crashes Hive.
 {noformat}
 2012-12-20 23:21:52,247 Stage-1 map = 0%,  reduce = 0%
 2012-12-20 23:22:11,315 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.29 
 sec
 MapReduce Total cumulative CPU time: 290 msec
 Ended Job = job_1354923204155_0777 with errors
 Error during job, obtaining debugging information...
 Job Tracking URL: 
 http://cs-10-20-81-171.cloud.cloudera.com:8088/proxy/application_1354923204155_0777/
 Examining task ID: task_1354923204155_0777_m_00 (and more) from job 
 job_1354923204155_0777
 Task with the most failures(4): 
 -
 Task ID:
   task_1354923204155_0777_m_00
 URL:
   
 http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1354923204155_0777tipid=task_1354923204155_0777_m_00
 -
 Diagnostic Messages for this Task:
 Error: Java heap space
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-4957) Restrict number of bit vectors, to prevent out of Java heap memory

2013-10-21 Thread Shreepadma Venugopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13801070#comment-13801070
 ] 

Shreepadma Venugopalan commented on HIVE-4957:
--

Thanks, Brock!

 Restrict number of bit vectors, to prevent out of Java heap memory
 --

 Key: HIVE-4957
 URL: https://issues.apache.org/jira/browse/HIVE-4957
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0
Reporter: Brock Noland
Assignee: Shreepadma Venugopalan
 Fix For: 0.13.0

 Attachments: HIVE-4957.1.patch, HIVE-4957.2.patch


 normally increase number of bit vectors will increase calculation accuracy. 
 Let's say
 {noformat}
 select compute_stats(a, 40) from test_hive;
 {noformat}
 generally get better accuracy than
 {noformat}
 select compute_stats(a, 16) from test_hive;
 {noformat}
 But larger number of bit vectors also cause query run slower. When number of 
 bit vectors over 50, it won't help to increase accuracy anymore. But it still 
 increase memory usage, and crash Hive if number if too huge. Current Hive 
 doesn't prevent user use ridiculous large number of bit vectors in 
 'compute_stats' query.
 One example
 {noformat}
 select compute_stats(a, 9) from column_eight_types;
 {noformat}
 crashes Hive.
 {noformat}
 2012-12-20 23:21:52,247 Stage-1 map = 0%,  reduce = 0%
 2012-12-20 23:22:11,315 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.29 
 sec
 MapReduce Total cumulative CPU time: 290 msec
 Ended Job = job_1354923204155_0777 with errors
 Error during job, obtaining debugging information...
 Job Tracking URL: 
 http://cs-10-20-81-171.cloud.cloudera.com:8088/proxy/application_1354923204155_0777/
 Examining task ID: task_1354923204155_0777_m_00 (and more) from job 
 job_1354923204155_0777
 Task with the most failures(4): 
 -
 Task ID:
   task_1354923204155_0777_m_00
 URL:
   
 http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1354923204155_0777tipid=task_1354923204155_0777_m_00
 -
 Diagnostic Messages for this Task:
 Error: Java heap space
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-4957) Restrict number of bit vectors, to prevent out of Java heap memory

2013-10-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13801123#comment-13801123
 ] 

Hudson commented on HIVE-4957:
--

FAILURE: Integrated in Hive-trunk-hadoop1-ptest #209 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/209/])
HIVE-4957 - Restrict number of bit vectors, to prevent out of Java heap memory 
(Shreepadma Venugopalan via Brock Noland) (brock: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1534337)
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java
* /hive/trunk/ql/src/test/queries/clientnegative/compute_stats_long.q
* /hive/trunk/ql/src/test/results/clientnegative/compute_stats_long.q.out


 Restrict number of bit vectors, to prevent out of Java heap memory
 --

 Key: HIVE-4957
 URL: https://issues.apache.org/jira/browse/HIVE-4957
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0
Reporter: Brock Noland
Assignee: Shreepadma Venugopalan
 Fix For: 0.13.0

 Attachments: HIVE-4957.1.patch, HIVE-4957.2.patch


 normally increase number of bit vectors will increase calculation accuracy. 
 Let's say
 {noformat}
 select compute_stats(a, 40) from test_hive;
 {noformat}
 generally get better accuracy than
 {noformat}
 select compute_stats(a, 16) from test_hive;
 {noformat}
 But larger number of bit vectors also cause query run slower. When number of 
 bit vectors over 50, it won't help to increase accuracy anymore. But it still 
 increase memory usage, and crash Hive if number if too huge. Current Hive 
 doesn't prevent user use ridiculous large number of bit vectors in 
 'compute_stats' query.
 One example
 {noformat}
 select compute_stats(a, 9) from column_eight_types;
 {noformat}
 crashes Hive.
 {noformat}
 2012-12-20 23:21:52,247 Stage-1 map = 0%,  reduce = 0%
 2012-12-20 23:22:11,315 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.29 
 sec
 MapReduce Total cumulative CPU time: 290 msec
 Ended Job = job_1354923204155_0777 with errors
 Error during job, obtaining debugging information...
 Job Tracking URL: 
 http://cs-10-20-81-171.cloud.cloudera.com:8088/proxy/application_1354923204155_0777/
 Examining task ID: task_1354923204155_0777_m_00 (and more) from job 
 job_1354923204155_0777
 Task with the most failures(4): 
 -
 Task ID:
   task_1354923204155_0777_m_00
 URL:
   
 http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1354923204155_0777tipid=task_1354923204155_0777_m_00
 -
 Diagnostic Messages for this Task:
 Error: Java heap space
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-4957) Restrict number of bit vectors, to prevent out of Java heap memory

2013-10-18 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799450#comment-13799450
 ] 

Brock Noland commented on HIVE-4957:


+1

Carl do you have any more concerns?

 Restrict number of bit vectors, to prevent out of Java heap memory
 --

 Key: HIVE-4957
 URL: https://issues.apache.org/jira/browse/HIVE-4957
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0
Reporter: Brock Noland
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-4957.1.patch, HIVE-4957.2.patch


 normally increase number of bit vectors will increase calculation accuracy. 
 Let's say
 {noformat}
 select compute_stats(a, 40) from test_hive;
 {noformat}
 generally get better accuracy than
 {noformat}
 select compute_stats(a, 16) from test_hive;
 {noformat}
 But larger number of bit vectors also cause query run slower. When number of 
 bit vectors over 50, it won't help to increase accuracy anymore. But it still 
 increase memory usage, and crash Hive if number if too huge. Current Hive 
 doesn't prevent user use ridiculous large number of bit vectors in 
 'compute_stats' query.
 One example
 {noformat}
 select compute_stats(a, 9) from column_eight_types;
 {noformat}
 crashes Hive.
 {noformat}
 2012-12-20 23:21:52,247 Stage-1 map = 0%,  reduce = 0%
 2012-12-20 23:22:11,315 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.29 
 sec
 MapReduce Total cumulative CPU time: 290 msec
 Ended Job = job_1354923204155_0777 with errors
 Error during job, obtaining debugging information...
 Job Tracking URL: 
 http://cs-10-20-81-171.cloud.cloudera.com:8088/proxy/application_1354923204155_0777/
 Examining task ID: task_1354923204155_0777_m_00 (and more) from job 
 job_1354923204155_0777
 Task with the most failures(4): 
 -
 Task ID:
   task_1354923204155_0777_m_00
 URL:
   
 http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1354923204155_0777tipid=task_1354923204155_0777_m_00
 -
 Diagnostic Messages for this Task:
 Error: Java heap space
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-4957) Restrict number of bit vectors, to prevent out of Java heap memory

2013-10-01 Thread Shreepadma Venugopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13783358#comment-13783358
 ] 

Shreepadma Venugopalan commented on HIVE-4957:
--

New patch addresses review comments.

 Restrict number of bit vectors, to prevent out of Java heap memory
 --

 Key: HIVE-4957
 URL: https://issues.apache.org/jira/browse/HIVE-4957
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0
Reporter: Brock Noland
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-4957.1.patch, HIVE-4957.2.patch


 normally increase number of bit vectors will increase calculation accuracy. 
 Let's say
 {noformat}
 select compute_stats(a, 40) from test_hive;
 {noformat}
 generally get better accuracy than
 {noformat}
 select compute_stats(a, 16) from test_hive;
 {noformat}
 But larger number of bit vectors also cause query run slower. When number of 
 bit vectors over 50, it won't help to increase accuracy anymore. But it still 
 increase memory usage, and crash Hive if number if too huge. Current Hive 
 doesn't prevent user use ridiculous large number of bit vectors in 
 'compute_stats' query.
 One example
 {noformat}
 select compute_stats(a, 9) from column_eight_types;
 {noformat}
 crashes Hive.
 {noformat}
 2012-12-20 23:21:52,247 Stage-1 map = 0%,  reduce = 0%
 2012-12-20 23:22:11,315 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.29 
 sec
 MapReduce Total cumulative CPU time: 290 msec
 Ended Job = job_1354923204155_0777 with errors
 Error during job, obtaining debugging information...
 Job Tracking URL: 
 http://cs-10-20-81-171.cloud.cloudera.com:8088/proxy/application_1354923204155_0777/
 Examining task ID: task_1354923204155_0777_m_00 (and more) from job 
 job_1354923204155_0777
 Task with the most failures(4): 
 -
 Task ID:
   task_1354923204155_0777_m_00
 URL:
   
 http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1354923204155_0777tipid=task_1354923204155_0777_m_00
 -
 Diagnostic Messages for this Task:
 Error: Java heap space
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-4957) Restrict number of bit vectors, to prevent out of Java heap memory

2013-10-01 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13783617#comment-13783617
 ] 

Hive QA commented on HIVE-4957:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12606199/HIVE-4957.2.patch

{color:green}SUCCESS:{color} +1 4078 tests passed

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/987/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/987/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

 Restrict number of bit vectors, to prevent out of Java heap memory
 --

 Key: HIVE-4957
 URL: https://issues.apache.org/jira/browse/HIVE-4957
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0
Reporter: Brock Noland
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-4957.1.patch, HIVE-4957.2.patch


 normally increase number of bit vectors will increase calculation accuracy. 
 Let's say
 {noformat}
 select compute_stats(a, 40) from test_hive;
 {noformat}
 generally get better accuracy than
 {noformat}
 select compute_stats(a, 16) from test_hive;
 {noformat}
 But larger number of bit vectors also cause query run slower. When number of 
 bit vectors over 50, it won't help to increase accuracy anymore. But it still 
 increase memory usage, and crash Hive if number if too huge. Current Hive 
 doesn't prevent user use ridiculous large number of bit vectors in 
 'compute_stats' query.
 One example
 {noformat}
 select compute_stats(a, 9) from column_eight_types;
 {noformat}
 crashes Hive.
 {noformat}
 2012-12-20 23:21:52,247 Stage-1 map = 0%,  reduce = 0%
 2012-12-20 23:22:11,315 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.29 
 sec
 MapReduce Total cumulative CPU time: 290 msec
 Ended Job = job_1354923204155_0777 with errors
 Error during job, obtaining debugging information...
 Job Tracking URL: 
 http://cs-10-20-81-171.cloud.cloudera.com:8088/proxy/application_1354923204155_0777/
 Examining task ID: task_1354923204155_0777_m_00 (and more) from job 
 job_1354923204155_0777
 Task with the most failures(4): 
 -
 Task ID:
   task_1354923204155_0777_m_00
 URL:
   
 http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1354923204155_0777tipid=task_1354923204155_0777_m_00
 -
 Diagnostic Messages for this Task:
 Error: Java heap space
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-4957) Restrict number of bit vectors, to prevent out of Java heap memory

2013-09-20 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13773369#comment-13773369
 ] 

Brock Noland commented on HIVE-4957:


LGTM, let's see what the tests say.

 Restrict number of bit vectors, to prevent out of Java heap memory
 --

 Key: HIVE-4957
 URL: https://issues.apache.org/jira/browse/HIVE-4957
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0
Reporter: Brock Noland
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-4957.1.patch


 normally increase number of bit vectors will increase calculation accuracy. 
 Let's say
 {noformat}
 select compute_stats(a, 40) from test_hive;
 {noformat}
 generally get better accuracy than
 {noformat}
 select compute_stats(a, 16) from test_hive;
 {noformat}
 But larger number of bit vectors also cause query run slower. When number of 
 bit vectors over 50, it won't help to increase accuracy anymore. But it still 
 increase memory usage, and crash Hive if number if too huge. Current Hive 
 doesn't prevent user use ridiculous large number of bit vectors in 
 'compute_stats' query.
 One example
 {noformat}
 select compute_stats(a, 9) from column_eight_types;
 {noformat}
 crashes Hive.
 {noformat}
 2012-12-20 23:21:52,247 Stage-1 map = 0%,  reduce = 0%
 2012-12-20 23:22:11,315 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.29 
 sec
 MapReduce Total cumulative CPU time: 290 msec
 Ended Job = job_1354923204155_0777 with errors
 Error during job, obtaining debugging information...
 Job Tracking URL: 
 http://cs-10-20-81-171.cloud.cloudera.com:8088/proxy/application_1354923204155_0777/
 Examining task ID: task_1354923204155_0777_m_00 (and more) from job 
 job_1354923204155_0777
 Task with the most failures(4): 
 -
 Task ID:
   task_1354923204155_0777_m_00
 URL:
   
 http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1354923204155_0777tipid=task_1354923204155_0777_m_00
 -
 Diagnostic Messages for this Task:
 Error: Java heap space
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4957) Restrict number of bit vectors, to prevent out of Java heap memory

2013-09-20 Thread Shreepadma Venugopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13773351#comment-13773351
 ] 

Shreepadma Venugopalan commented on HIVE-4957:
--

RB: https://reviews.apache.org/r/14250/

 Restrict number of bit vectors, to prevent out of Java heap memory
 --

 Key: HIVE-4957
 URL: https://issues.apache.org/jira/browse/HIVE-4957
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0
Reporter: Brock Noland
Assignee: Shreepadma Venugopalan

 normally increase number of bit vectors will increase calculation accuracy. 
 Let's say
 {noformat}
 select compute_stats(a, 40) from test_hive;
 {noformat}
 generally get better accuracy than
 {noformat}
 select compute_stats(a, 16) from test_hive;
 {noformat}
 But larger number of bit vectors also cause query run slower. When number of 
 bit vectors over 50, it won't help to increase accuracy anymore. But it still 
 increase memory usage, and crash Hive if number if too huge. Current Hive 
 doesn't prevent user use ridiculous large number of bit vectors in 
 'compute_stats' query.
 One example
 {noformat}
 select compute_stats(a, 9) from column_eight_types;
 {noformat}
 crashes Hive.
 {noformat}
 2012-12-20 23:21:52,247 Stage-1 map = 0%,  reduce = 0%
 2012-12-20 23:22:11,315 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.29 
 sec
 MapReduce Total cumulative CPU time: 290 msec
 Ended Job = job_1354923204155_0777 with errors
 Error during job, obtaining debugging information...
 Job Tracking URL: 
 http://cs-10-20-81-171.cloud.cloudera.com:8088/proxy/application_1354923204155_0777/
 Examining task ID: task_1354923204155_0777_m_00 (and more) from job 
 job_1354923204155_0777
 Task with the most failures(4): 
 -
 Task ID:
   task_1354923204155_0777_m_00
 URL:
   
 http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1354923204155_0777tipid=task_1354923204155_0777_m_00
 -
 Diagnostic Messages for this Task:
 Error: Java heap space
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira