[jira] [Commented] (HIVE-6300) Add documentation for stats configs to hive-default.xml.template

2014-01-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883907#comment-13883907
 ] 

Hive QA commented on HIVE-6300:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12625428/HIVE-6300.2.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 4961 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_import_exported_table
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_load_hdfs_file_with_space_in_the_name
org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testNegativeCliDriver_file_with_header_footer_negative
{noformat}

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1054/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1054/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12625428

 Add documentation for stats configs to hive-default.xml.template
 

 Key: HIVE-6300
 URL: https://issues.apache.org/jira/browse/HIVE-6300
 Project: Hive
  Issue Type: Sub-task
  Components: Query Processor, Statistics
Affects Versions: 0.13.0
Reporter: Prasanth J
Assignee: Prasanth J
Priority: Minor
 Fix For: 0.13.0

 Attachments: HIVE-6300.1.patch, HIVE-6300.2.patch


 Add documentation for the following configs
 hive.stats.max.variable.length
 hive.stats.list.num.entries
 hive.stats.map.num.entries
 hive.stats.map.parallelism
 hive.stats.fetch.column.stats
 hive.stats.avg.row.size
 hive.stats.join.factor
 hive.stats.deserialization.factor
 hive.stats.fetch.partition.stats



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6300) Add documentation for stats configs to hive-default.xml.template

2014-01-28 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883920#comment-13883920
 ] 

Lefty Leverenz commented on HIVE-6300:
--

bq.  I had a space earlier, but had to remove it because of 100 chars limit.

Then it's fine -- it won't confuse anybody.  Thanks, [~prasanth_j].

 Add documentation for stats configs to hive-default.xml.template
 

 Key: HIVE-6300
 URL: https://issues.apache.org/jira/browse/HIVE-6300
 Project: Hive
  Issue Type: Sub-task
  Components: Query Processor, Statistics
Affects Versions: 0.13.0
Reporter: Prasanth J
Assignee: Prasanth J
Priority: Minor
 Fix For: 0.13.0

 Attachments: HIVE-6300.1.patch, HIVE-6300.2.patch


 Add documentation for the following configs
 hive.stats.max.variable.length
 hive.stats.list.num.entries
 hive.stats.map.num.entries
 hive.stats.map.parallelism
 hive.stats.fetch.column.stats
 hive.stats.avg.row.size
 hive.stats.join.factor
 hive.stats.deserialization.factor
 hive.stats.fetch.partition.stats



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6300) Add documentation for stats configs to hive-default.xml.template

2014-01-28 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13884415#comment-13884415
 ] 

Prasanth J commented on HIVE-6300:
--

Test failures are unrelated. HIVE-6310 and HIVE-6322 fixes the failures. 
[~leftylev] or [~rhbutani] can someone please commit this patch? Thanks.

 Add documentation for stats configs to hive-default.xml.template
 

 Key: HIVE-6300
 URL: https://issues.apache.org/jira/browse/HIVE-6300
 Project: Hive
  Issue Type: Sub-task
  Components: Query Processor, Statistics
Affects Versions: 0.13.0
Reporter: Prasanth J
Assignee: Prasanth J
Priority: Minor
 Fix For: 0.13.0

 Attachments: HIVE-6300.1.patch, HIVE-6300.2.patch


 Add documentation for the following configs
 hive.stats.max.variable.length
 hive.stats.list.num.entries
 hive.stats.map.num.entries
 hive.stats.map.parallelism
 hive.stats.fetch.column.stats
 hive.stats.avg.row.size
 hive.stats.join.factor
 hive.stats.deserialization.factor
 hive.stats.fetch.partition.stats



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6300) Add documentation for stats configs to hive-default.xml.template

2014-01-28 Thread Harish Butani (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13884483#comment-13884483
 ] 

Harish Butani commented on HIVE-6300:
-

+1

 Add documentation for stats configs to hive-default.xml.template
 

 Key: HIVE-6300
 URL: https://issues.apache.org/jira/browse/HIVE-6300
 Project: Hive
  Issue Type: Sub-task
  Components: Query Processor, Statistics
Affects Versions: 0.13.0
Reporter: Prasanth J
Assignee: Prasanth J
Priority: Minor
 Fix For: 0.13.0

 Attachments: HIVE-6300.1.patch, HIVE-6300.2.patch


 Add documentation for the following configs
 hive.stats.max.variable.length
 hive.stats.list.num.entries
 hive.stats.map.num.entries
 hive.stats.map.parallelism
 hive.stats.fetch.column.stats
 hive.stats.avg.row.size
 hive.stats.join.factor
 hive.stats.deserialization.factor
 hive.stats.fetch.partition.stats



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6300) Add documentation for stats configs to hive-default.xml.template

2014-01-27 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883698#comment-13883698
 ] 

Lefty Leverenz commented on HIVE-6300:
--

But if I read it all again, I might find new nits to pick.  (Tech writers are 
notorious for that.)  Well, here goes.

Nit 1:  in Hive/Tez(for needs a space in 3 config descriptions
Nit 2:  there is no Nit 2, so I'm not sure it's worth the effort of fixing Nit 
1 (I can fix it later with other fixes)

Looks good.

 Add documentation for stats configs to hive-default.xml.template
 

 Key: HIVE-6300
 URL: https://issues.apache.org/jira/browse/HIVE-6300
 Project: Hive
  Issue Type: Sub-task
  Components: Query Processor, Statistics
Affects Versions: 0.13.0
Reporter: Prasanth J
Assignee: Prasanth J
Priority: Minor
 Fix For: 0.13.0

 Attachments: HIVE-6300.1.patch, HIVE-6300.2.patch


 Add documentation for the following configs
 hive.stats.max.variable.length
 hive.stats.list.num.entries
 hive.stats.map.num.entries
 hive.stats.map.parallelism
 hive.stats.fetch.column.stats
 hive.stats.avg.row.size
 hive.stats.join.factor
 hive.stats.deserialization.factor
 hive.stats.fetch.partition.stats



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6300) Add documentation for stats configs to hive-default.xml.template

2014-01-27 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883701#comment-13883701
 ] 

Prasanth J commented on HIVE-6300:
--

I had a space earlier, but had to remove it because of 100 chars limit.

 Add documentation for stats configs to hive-default.xml.template
 

 Key: HIVE-6300
 URL: https://issues.apache.org/jira/browse/HIVE-6300
 Project: Hive
  Issue Type: Sub-task
  Components: Query Processor, Statistics
Affects Versions: 0.13.0
Reporter: Prasanth J
Assignee: Prasanth J
Priority: Minor
 Fix For: 0.13.0

 Attachments: HIVE-6300.1.patch, HIVE-6300.2.patch


 Add documentation for the following configs
 hive.stats.max.variable.length
 hive.stats.list.num.entries
 hive.stats.map.num.entries
 hive.stats.map.parallelism
 hive.stats.fetch.column.stats
 hive.stats.avg.row.size
 hive.stats.join.factor
 hive.stats.deserialization.factor
 hive.stats.fetch.partition.stats



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6300) Add documentation for stats configs to hive-default.xml.template

2014-01-25 Thread Harish Butani (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881972#comment-13881972
 ] 

Harish Butani commented on HIVE-6300:
-

looks good

 Add documentation for stats configs to hive-default.xml.template
 

 Key: HIVE-6300
 URL: https://issues.apache.org/jira/browse/HIVE-6300
 Project: Hive
  Issue Type: Sub-task
  Components: Query Processor, Statistics
Affects Versions: 0.13.0
Reporter: Prasanth J
Assignee: Prasanth J
Priority: Minor
 Fix For: 0.13.0

 Attachments: HIVE-6300.1.patch


 Add documentation for the following configs
 hive.stats.max.variable.length
 hive.stats.list.num.entries
 hive.stats.map.num.entries
 hive.stats.map.parallelism
 hive.stats.fetch.column.stats
 hive.stats.avg.row.size
 hive.stats.join.factor
 hive.stats.deserialization.factor
 hive.stats.fetch.partition.stats



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6300) Add documentation for stats configs to hive-default.xml.template

2014-01-25 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882153#comment-13882153
 ] 

Lefty Leverenz commented on HIVE-6300:
--

Good detailed descriptions.  Just some nit-picks and a few points of confusion: 

# Please limit the line lengths to 100 chars.  (hive-default.xml.template is 
far from perfect on this convention, but I'm planning to tidy it up someday.)
# hive/tez should be Hive/Tez and java should be Java in these descriptions:
#* hive.stats.max.variable.length
#* hive.stats.list.num.entries
#* hive.stats.map.num.entries
# In hive.stats.map.parallelism description:
#* through each of the operator should be operators or through each 
operator 
#* Some operators like GROUPBY, generates more number of rows that corresponds 
to the number of mappers. -- omit the comma, make generates singular, and 
I'm not sure what you mean by more number of rows that corresponds to the 
number of mappers -- what's the correspondence, more rows means more 
parallelism?  At first I thought that should be than but now I don't know.  
The comment in HiveConf.java is simpler:  to accurately compute statistics for 
GROUPBY map side parallelism needs to be known.
#* hive should be Hive
# In hive.stats.fetch.column.stats description, for each needed columns 
should be column and when the number of columns are high should be is 
high.  Also, why does the comment in HiveConf.java mention partitions too?  
Maybe it's left over from previous behavior, before 
hive.stats.fetch.partition.stats was created:
#* +// statistics annotation fetches column statistics for all required 
columns and for all
+// required partitions which can be very expensive sometimes
# In hive.stats.fetch.partition.stats description, paritition should be 
partition and when the number of partitions are high should be is high.  
Also, does this information mean the same as what's in HiveConf.java?
#* When this flag is disabled, Hive will make calls to filesystem to get file 
sizes and will estimate the number of rows from row schema.
#* HiveConf.java:  basic sizes being fetched from namenode
# In hive.stats.avg.row.size description:
#* again, through each of the operator should be operators or through each 
operator
#* LIMIT operator (which knows the number of rows) will use this value to 
estimate the size of data flowing through LIMIT operator left me wondering 
what's done to estimate data flowing through other operators.  (But now I 
realize they're estimated using other configs.  But isn't it the optimizer that 
uses this value, not the LIMIT operator?)  Also, this description doesn't seem 
to match what's in HiveConf.java -- average row size will be used to estimate 
the number of rows/data size -- is number of rows known or not?
# In hive.stats.join.factor description:
#* again, through each of the operator should be operators or through each 
operator
#* by the way, in HiveConf.java the comment is slightly garbled:  in the 
absence of column statistics, the estimated number of rows/data size that will 
be emitted from join operator will depend on t this factor
# In hive.stats.deserialization.factor description:
#* again, through each of the operator should be operators or through each 
operator
#* Since files in table/partitions are ... should be tables/partitions 
(micro-nit) 

Whew.  Sorry about the number of nits.  If you like, I can make these changes 
in a temporary patch and let you remove the ones you don't like and clear up 
confusions in a third patch.

 Add documentation for stats configs to hive-default.xml.template
 

 Key: HIVE-6300
 URL: https://issues.apache.org/jira/browse/HIVE-6300
 Project: Hive
  Issue Type: Sub-task
  Components: Query Processor, Statistics
Affects Versions: 0.13.0
Reporter: Prasanth J
Assignee: Prasanth J
Priority: Minor
 Fix For: 0.13.0

 Attachments: HIVE-6300.1.patch


 Add documentation for the following configs
 hive.stats.max.variable.length
 hive.stats.list.num.entries
 hive.stats.map.num.entries
 hive.stats.map.parallelism
 hive.stats.fetch.column.stats
 hive.stats.avg.row.size
 hive.stats.join.factor
 hive.stats.deserialization.factor
 hive.stats.fetch.partition.stats



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6300) Add documentation for stats configs to hive-default.xml.template

2014-01-23 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13880658#comment-13880658
 ] 

Prasanth J commented on HIVE-6300:
--

Added documentation for the configs in description. [~leftylev] or [~rhbutani] 
can you please take a look and see if it looks good?

 Add documentation for stats configs to hive-default.xml.template
 

 Key: HIVE-6300
 URL: https://issues.apache.org/jira/browse/HIVE-6300
 Project: Hive
  Issue Type: Sub-task
  Components: Query Processor, Statistics
Affects Versions: 0.13.0
Reporter: Prasanth J
Assignee: Prasanth J
Priority: Minor
 Fix For: 0.13.0

 Attachments: HIVE-6300.1.patch


 Add documentation for the following configs
 hive.stats.max.variable.length
 hive.stats.list.num.entries
 hive.stats.map.num.entries
 hive.stats.map.parallelism
 hive.stats.fetch.column.stats
 hive.stats.avg.row.size
 hive.stats.join.factor
 hive.stats.deserialization.factor
 hive.stats.fetch.partition.stats



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)