[jira] [Commented] (HIVE-6300) Add documentation for stats configs to hive-default.xml.template
[ https://issues.apache.org/jira/browse/HIVE-6300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883907#comment-13883907 ] Hive QA commented on HIVE-6300: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12625428/HIVE-6300.2.patch {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 4961 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_import_exported_table org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_load_hdfs_file_with_space_in_the_name org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testNegativeCliDriver_file_with_header_footer_negative {noformat} Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1054/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1054/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12625428 Add documentation for stats configs to hive-default.xml.template Key: HIVE-6300 URL: https://issues.apache.org/jira/browse/HIVE-6300 Project: Hive Issue Type: Sub-task Components: Query Processor, Statistics Affects Versions: 0.13.0 Reporter: Prasanth J Assignee: Prasanth J Priority: Minor Fix For: 0.13.0 Attachments: HIVE-6300.1.patch, HIVE-6300.2.patch Add documentation for the following configs hive.stats.max.variable.length hive.stats.list.num.entries hive.stats.map.num.entries hive.stats.map.parallelism hive.stats.fetch.column.stats hive.stats.avg.row.size hive.stats.join.factor hive.stats.deserialization.factor hive.stats.fetch.partition.stats -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6300) Add documentation for stats configs to hive-default.xml.template
[ https://issues.apache.org/jira/browse/HIVE-6300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883920#comment-13883920 ] Lefty Leverenz commented on HIVE-6300: -- bq. I had a space earlier, but had to remove it because of 100 chars limit. Then it's fine -- it won't confuse anybody. Thanks, [~prasanth_j]. Add documentation for stats configs to hive-default.xml.template Key: HIVE-6300 URL: https://issues.apache.org/jira/browse/HIVE-6300 Project: Hive Issue Type: Sub-task Components: Query Processor, Statistics Affects Versions: 0.13.0 Reporter: Prasanth J Assignee: Prasanth J Priority: Minor Fix For: 0.13.0 Attachments: HIVE-6300.1.patch, HIVE-6300.2.patch Add documentation for the following configs hive.stats.max.variable.length hive.stats.list.num.entries hive.stats.map.num.entries hive.stats.map.parallelism hive.stats.fetch.column.stats hive.stats.avg.row.size hive.stats.join.factor hive.stats.deserialization.factor hive.stats.fetch.partition.stats -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6300) Add documentation for stats configs to hive-default.xml.template
[ https://issues.apache.org/jira/browse/HIVE-6300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13884415#comment-13884415 ] Prasanth J commented on HIVE-6300: -- Test failures are unrelated. HIVE-6310 and HIVE-6322 fixes the failures. [~leftylev] or [~rhbutani] can someone please commit this patch? Thanks. Add documentation for stats configs to hive-default.xml.template Key: HIVE-6300 URL: https://issues.apache.org/jira/browse/HIVE-6300 Project: Hive Issue Type: Sub-task Components: Query Processor, Statistics Affects Versions: 0.13.0 Reporter: Prasanth J Assignee: Prasanth J Priority: Minor Fix For: 0.13.0 Attachments: HIVE-6300.1.patch, HIVE-6300.2.patch Add documentation for the following configs hive.stats.max.variable.length hive.stats.list.num.entries hive.stats.map.num.entries hive.stats.map.parallelism hive.stats.fetch.column.stats hive.stats.avg.row.size hive.stats.join.factor hive.stats.deserialization.factor hive.stats.fetch.partition.stats -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6300) Add documentation for stats configs to hive-default.xml.template
[ https://issues.apache.org/jira/browse/HIVE-6300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13884483#comment-13884483 ] Harish Butani commented on HIVE-6300: - +1 Add documentation for stats configs to hive-default.xml.template Key: HIVE-6300 URL: https://issues.apache.org/jira/browse/HIVE-6300 Project: Hive Issue Type: Sub-task Components: Query Processor, Statistics Affects Versions: 0.13.0 Reporter: Prasanth J Assignee: Prasanth J Priority: Minor Fix For: 0.13.0 Attachments: HIVE-6300.1.patch, HIVE-6300.2.patch Add documentation for the following configs hive.stats.max.variable.length hive.stats.list.num.entries hive.stats.map.num.entries hive.stats.map.parallelism hive.stats.fetch.column.stats hive.stats.avg.row.size hive.stats.join.factor hive.stats.deserialization.factor hive.stats.fetch.partition.stats -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6300) Add documentation for stats configs to hive-default.xml.template
[ https://issues.apache.org/jira/browse/HIVE-6300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883698#comment-13883698 ] Lefty Leverenz commented on HIVE-6300: -- But if I read it all again, I might find new nits to pick. (Tech writers are notorious for that.) Well, here goes. Nit 1: in Hive/Tez(for needs a space in 3 config descriptions Nit 2: there is no Nit 2, so I'm not sure it's worth the effort of fixing Nit 1 (I can fix it later with other fixes) Looks good. Add documentation for stats configs to hive-default.xml.template Key: HIVE-6300 URL: https://issues.apache.org/jira/browse/HIVE-6300 Project: Hive Issue Type: Sub-task Components: Query Processor, Statistics Affects Versions: 0.13.0 Reporter: Prasanth J Assignee: Prasanth J Priority: Minor Fix For: 0.13.0 Attachments: HIVE-6300.1.patch, HIVE-6300.2.patch Add documentation for the following configs hive.stats.max.variable.length hive.stats.list.num.entries hive.stats.map.num.entries hive.stats.map.parallelism hive.stats.fetch.column.stats hive.stats.avg.row.size hive.stats.join.factor hive.stats.deserialization.factor hive.stats.fetch.partition.stats -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6300) Add documentation for stats configs to hive-default.xml.template
[ https://issues.apache.org/jira/browse/HIVE-6300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883701#comment-13883701 ] Prasanth J commented on HIVE-6300: -- I had a space earlier, but had to remove it because of 100 chars limit. Add documentation for stats configs to hive-default.xml.template Key: HIVE-6300 URL: https://issues.apache.org/jira/browse/HIVE-6300 Project: Hive Issue Type: Sub-task Components: Query Processor, Statistics Affects Versions: 0.13.0 Reporter: Prasanth J Assignee: Prasanth J Priority: Minor Fix For: 0.13.0 Attachments: HIVE-6300.1.patch, HIVE-6300.2.patch Add documentation for the following configs hive.stats.max.variable.length hive.stats.list.num.entries hive.stats.map.num.entries hive.stats.map.parallelism hive.stats.fetch.column.stats hive.stats.avg.row.size hive.stats.join.factor hive.stats.deserialization.factor hive.stats.fetch.partition.stats -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6300) Add documentation for stats configs to hive-default.xml.template
[ https://issues.apache.org/jira/browse/HIVE-6300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881972#comment-13881972 ] Harish Butani commented on HIVE-6300: - looks good Add documentation for stats configs to hive-default.xml.template Key: HIVE-6300 URL: https://issues.apache.org/jira/browse/HIVE-6300 Project: Hive Issue Type: Sub-task Components: Query Processor, Statistics Affects Versions: 0.13.0 Reporter: Prasanth J Assignee: Prasanth J Priority: Minor Fix For: 0.13.0 Attachments: HIVE-6300.1.patch Add documentation for the following configs hive.stats.max.variable.length hive.stats.list.num.entries hive.stats.map.num.entries hive.stats.map.parallelism hive.stats.fetch.column.stats hive.stats.avg.row.size hive.stats.join.factor hive.stats.deserialization.factor hive.stats.fetch.partition.stats -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6300) Add documentation for stats configs to hive-default.xml.template
[ https://issues.apache.org/jira/browse/HIVE-6300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882153#comment-13882153 ] Lefty Leverenz commented on HIVE-6300: -- Good detailed descriptions. Just some nit-picks and a few points of confusion: # Please limit the line lengths to 100 chars. (hive-default.xml.template is far from perfect on this convention, but I'm planning to tidy it up someday.) # hive/tez should be Hive/Tez and java should be Java in these descriptions: #* hive.stats.max.variable.length #* hive.stats.list.num.entries #* hive.stats.map.num.entries # In hive.stats.map.parallelism description: #* through each of the operator should be operators or through each operator #* Some operators like GROUPBY, generates more number of rows that corresponds to the number of mappers. -- omit the comma, make generates singular, and I'm not sure what you mean by more number of rows that corresponds to the number of mappers -- what's the correspondence, more rows means more parallelism? At first I thought that should be than but now I don't know. The comment in HiveConf.java is simpler: to accurately compute statistics for GROUPBY map side parallelism needs to be known. #* hive should be Hive # In hive.stats.fetch.column.stats description, for each needed columns should be column and when the number of columns are high should be is high. Also, why does the comment in HiveConf.java mention partitions too? Maybe it's left over from previous behavior, before hive.stats.fetch.partition.stats was created: #* +// statistics annotation fetches column statistics for all required columns and for all +// required partitions which can be very expensive sometimes # In hive.stats.fetch.partition.stats description, paritition should be partition and when the number of partitions are high should be is high. Also, does this information mean the same as what's in HiveConf.java? #* When this flag is disabled, Hive will make calls to filesystem to get file sizes and will estimate the number of rows from row schema. #* HiveConf.java: basic sizes being fetched from namenode # In hive.stats.avg.row.size description: #* again, through each of the operator should be operators or through each operator #* LIMIT operator (which knows the number of rows) will use this value to estimate the size of data flowing through LIMIT operator left me wondering what's done to estimate data flowing through other operators. (But now I realize they're estimated using other configs. But isn't it the optimizer that uses this value, not the LIMIT operator?) Also, this description doesn't seem to match what's in HiveConf.java -- average row size will be used to estimate the number of rows/data size -- is number of rows known or not? # In hive.stats.join.factor description: #* again, through each of the operator should be operators or through each operator #* by the way, in HiveConf.java the comment is slightly garbled: in the absence of column statistics, the estimated number of rows/data size that will be emitted from join operator will depend on t this factor # In hive.stats.deserialization.factor description: #* again, through each of the operator should be operators or through each operator #* Since files in table/partitions are ... should be tables/partitions (micro-nit) Whew. Sorry about the number of nits. If you like, I can make these changes in a temporary patch and let you remove the ones you don't like and clear up confusions in a third patch. Add documentation for stats configs to hive-default.xml.template Key: HIVE-6300 URL: https://issues.apache.org/jira/browse/HIVE-6300 Project: Hive Issue Type: Sub-task Components: Query Processor, Statistics Affects Versions: 0.13.0 Reporter: Prasanth J Assignee: Prasanth J Priority: Minor Fix For: 0.13.0 Attachments: HIVE-6300.1.patch Add documentation for the following configs hive.stats.max.variable.length hive.stats.list.num.entries hive.stats.map.num.entries hive.stats.map.parallelism hive.stats.fetch.column.stats hive.stats.avg.row.size hive.stats.join.factor hive.stats.deserialization.factor hive.stats.fetch.partition.stats -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6300) Add documentation for stats configs to hive-default.xml.template
[ https://issues.apache.org/jira/browse/HIVE-6300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13880658#comment-13880658 ] Prasanth J commented on HIVE-6300: -- Added documentation for the configs in description. [~leftylev] or [~rhbutani] can you please take a look and see if it looks good? Add documentation for stats configs to hive-default.xml.template Key: HIVE-6300 URL: https://issues.apache.org/jira/browse/HIVE-6300 Project: Hive Issue Type: Sub-task Components: Query Processor, Statistics Affects Versions: 0.13.0 Reporter: Prasanth J Assignee: Prasanth J Priority: Minor Fix For: 0.13.0 Attachments: HIVE-6300.1.patch Add documentation for the following configs hive.stats.max.variable.length hive.stats.list.num.entries hive.stats.map.num.entries hive.stats.map.parallelism hive.stats.fetch.column.stats hive.stats.avg.row.size hive.stats.join.factor hive.stats.deserialization.factor hive.stats.fetch.partition.stats -- This message was sent by Atlassian JIRA (v6.1.5#6160)