[jira] Commented: (HIVE-1348) Moving inputFileChanged() from ExecMapper to where it is needed
[ https://issues.apache.org/jira/browse/HIVE-1348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12869246#action_12869246 ] He Yongqiang commented on HIVE-1348: 1. I changed ExecMapperContext to ExecMapperLocalContext because right now it is only used for map joins. But i can revert it because 'ExecMapperContext ' is more general and can be used later for other cases. 2. Yes. We need to use ExecMapper.memoryMXBean to report memory usage in ExecMapperLocalContext. This is because it can be used to track memory usage for mapjoin's local work. And once ExecMapper.memoryMXBean is public, it can also be used in other places. 3. will do it. Moving inputFileChanged() from ExecMapper to where it is needed --- Key: HIVE-1348 URL: https://issues.apache.org/jira/browse/HIVE-1348 Project: Hadoop Hive Issue Type: Improvement Reporter: Ning Zhang Assignee: He Yongqiang Attachments: hive-1348.1.patch inputFileChanged() is only needed for Bucketed sort merge map join. It should not be put in ExecMapper.map() where all code paths will hit this function. This function is quite expensive since JobConf look up is a hash table look up. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1348) Moving inputFileChanged() from ExecMapper to where it is needed
[ https://issues.apache.org/jira/browse/HIVE-1348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Yongqiang updated HIVE-1348: --- Attachment: hive-1348.2.patch Moving inputFileChanged() from ExecMapper to where it is needed --- Key: HIVE-1348 URL: https://issues.apache.org/jira/browse/HIVE-1348 Project: Hadoop Hive Issue Type: Improvement Reporter: Ning Zhang Assignee: He Yongqiang Attachments: hive-1348.1.patch, hive-1348.2.patch inputFileChanged() is only needed for Bucketed sort merge map join. It should not be put in ExecMapper.map() where all code paths will hit this function. This function is quite expensive since JobConf look up is a hash table look up. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1351) Tool to cat rcfiles
[ https://issues.apache.org/jira/browse/HIVE-1351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12869284#action_12869284 ] Edward Capriolo commented on HIVE-1351: --- As ning mentioned why move the cli code. If anything more of the code should be moving up into the main script rather then into smaller scripts. I see people making changes to only the cli. We have to make sure that fixes for things like cygwin get propogated to all files, or shared code gets shared. Also rcfilecat is just a debug util, but it should have a unit test right? Just cat to files to make sure it works? Tool to cat rcfiles --- Key: HIVE-1351 URL: https://issues.apache.org/jira/browse/HIVE-1351 Project: Hadoop Hive Issue Type: New Feature Reporter: Namit Jain Assignee: He Yongqiang Attachments: hive.1351.1.patch, hive.1351.2.patch It will be useful for debugging -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1351) Tool to cat rcfiles
[ https://issues.apache.org/jira/browse/HIVE-1351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12869287#action_12869287 ] Venky Iyer commented on HIVE-1351: -- I think we'll end up using rcfilecat in a lot of stuff, not just for debugging -- (to dump small tables for off-Hive processing). It should be treated as production code IMO. Tool to cat rcfiles --- Key: HIVE-1351 URL: https://issues.apache.org/jira/browse/HIVE-1351 Project: Hadoop Hive Issue Type: New Feature Reporter: Namit Jain Assignee: He Yongqiang Attachments: hive.1351.1.patch, hive.1351.2.patch It will be useful for debugging -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1348) Moving inputFileChanged() from ExecMapper to where it is needed
[ https://issues.apache.org/jira/browse/HIVE-1348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12869299#action_12869299 ] He Yongqiang commented on HIVE-1348: 1. We do not want to check the conf 2 times to see if the input file has changed or not. So that's why the variable inputFileChanged is used for. maybe we should give a better name to 'inputFileChanged()' ( checkInputFileChanged() ?) 2. i will change the variable name. 3. No. they will not change the mapjoin behavior. Those code will only be executed one time for normal mapjoin. Moving inputFileChanged() from ExecMapper to where it is needed --- Key: HIVE-1348 URL: https://issues.apache.org/jira/browse/HIVE-1348 Project: Hadoop Hive Issue Type: Improvement Reporter: Ning Zhang Assignee: He Yongqiang Attachments: hive-1348.1.patch, hive-1348.2.patch inputFileChanged() is only needed for Bucketed sort merge map join. It should not be put in ExecMapper.map() where all code paths will hit this function. This function is quite expensive since JobConf look up is a hash table look up. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1351) Tool to cat rcfiles
[ https://issues.apache.org/jira/browse/HIVE-1351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12869306#action_12869306 ] He Yongqiang commented on HIVE-1351: I moved almost all cli.sh code to util/execHiveCmd because these code can be shared with rcfilecat. i think these code should be independent (we may add new command in future.) Tool to cat rcfiles --- Key: HIVE-1351 URL: https://issues.apache.org/jira/browse/HIVE-1351 Project: Hadoop Hive Issue Type: New Feature Reporter: Namit Jain Assignee: He Yongqiang Fix For: 0.6.0 Attachments: hive.1351.1.patch, hive.1351.2.patch It will be useful for debugging -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1352) rcfilecat should use '\t' to separate columns and print '\r\n' at the end of each row.
rcfilecat should use '\t' to separate columns and print '\r\n' at the end of each row. -- Key: HIVE-1352 URL: https://issues.apache.org/jira/browse/HIVE-1352 Project: Hadoop Hive Issue Type: Bug Reporter: He Yongqiang Assignee: He Yongqiang Talked to Venky, rcfilecat needs to add column and line delimiters. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1352) rcfilecat should use '\t' to separate columns and print '\r\n' at the end of each row.
[ https://issues.apache.org/jira/browse/HIVE-1352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Yongqiang updated HIVE-1352: --- Attachment: hive.1352.1.patch rcfilecat should use '\t' to separate columns and print '\r\n' at the end of each row. -- Key: HIVE-1352 URL: https://issues.apache.org/jira/browse/HIVE-1352 Project: Hadoop Hive Issue Type: Bug Reporter: He Yongqiang Assignee: He Yongqiang Attachments: hive.1352.1.patch Talked to Venky, rcfilecat needs to add column and line delimiters. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1350) hive.query.id is not unique
[ https://issues.apache.org/jira/browse/HIVE-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12869323#action_12869323 ] John Sichi commented on HIVE-1350: -- +1. Will commit if tests pass. hive.query.id is not unique Key: HIVE-1350 URL: https://issues.apache.org/jira/browse/HIVE-1350 Project: Hadoop Hive Issue Type: Bug Reporter: Namit Jain Assignee: Namit Jain Attachments: hive.1350.1.patch, hive.1350.2.patch if commands are executed by the same user within a second -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1352) rcfilecat should use '\t' to separate columns and print '\r\n' at the end of each row.
[ https://issues.apache.org/jira/browse/HIVE-1352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12869327#action_12869327 ] Venky Iyer commented on HIVE-1352: -- why '\r\n'? rcfilecat should use '\t' to separate columns and print '\r\n' at the end of each row. -- Key: HIVE-1352 URL: https://issues.apache.org/jira/browse/HIVE-1352 Project: Hadoop Hive Issue Type: Bug Reporter: He Yongqiang Assignee: He Yongqiang Attachments: hive.1352.1.patch Talked to Venky, rcfilecat needs to add column and line delimiters. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1353) load_dyn_part*.q tests need ORDER BY for determinism
load_dyn_part*.q tests need ORDER BY for determinism Key: HIVE-1353 URL: https://issues.apache.org/jira/browse/HIVE-1353 Project: Hadoop Hive Issue Type: Bug Affects Versions: 0.6.0 Reporter: John Sichi Assignee: Ning Zhang Fix For: 0.6.0 Just now got a spurious failure from this while testing something else. [junit] diff -a -I file: -I /tmp/ -I invalidscheme: -I lastUpdateTime -I la\ stAccessTime -I owner -I transient_lastDdlTime -I java.lang.RuntimeException -I\ at org -I at sun -I at java -I at junit -I Caused by: -I [.][.][.] [0-9]* more\ /data/users/jsichi/open/commit-trunk/.ptest_0/build/ql/test/logs/clientpositiv\ e/load_dyn_part14.q.out /data/users/jsichi/open/commit-trunk/.ptest_0/ql/src/te\ st/results/clientpositive/load_dyn_part14.q.out [junit] 261,262d260 [junit] k1__HIVE_DEFAULT_PARTITION__ [junit] k1__HIVE_DEFAULT_PARTITION__ [junit] 264a263,264 [junit] k1__HIVE_DEFAULT_PARTITION__ [junit] k1__HIVE_DEFAULT_PARTITION__ [junit] Exception: Client execution results failed with error code = 1 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1351) Tool to cat rcfiles
[ https://issues.apache.org/jira/browse/HIVE-1351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12869344#action_12869344 ] Edward Capriolo commented on HIVE-1351: --- This is so notpicky, but {noformat} +--rcfilecat) + SERVICE=rcfilecat + shift + ;; {noformat} I do not think we should do this. We are just giving alternate invocations that end up being more confusing. Why should you be able to do this: {noformat} hive --rcfilecat {noformat} but not {noformat} hive --hwi {noformat} ? as for execHiveCmd. If you want to share this why not move it up into bin/hive? We do not need to add a file to shared when subs specified in in bin/hive are already shared. Tool to cat rcfiles --- Key: HIVE-1351 URL: https://issues.apache.org/jira/browse/HIVE-1351 Project: Hadoop Hive Issue Type: New Feature Reporter: Namit Jain Assignee: He Yongqiang Fix For: 0.6.0 Attachments: hive.1351.1.patch, hive.1351.2.patch It will be useful for debugging -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1354) partition level properties honored if it exists
partition level properties honored if it exists --- Key: HIVE-1354 URL: https://issues.apache.org/jira/browse/HIVE-1354 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Reporter: Namit Jain drop table partition_test_partitioned; create table partition_test_partitioned(key string, value string) partitioned by (dt string); alter table partition_test_partitioned set fileformat rcfile; insert overwrite table partition_test_partitioned partition(dt=101) select * from src1; show table extended like partition_test_partitioned partition(dt=101); alter table partition_test_partitioned set fileformat Sequencefile; insert overwrite table partition_test_partitioned partition(dt=102) select * from src1; show table extended like partition_test_partitioned partition(dt=102); insert overwrite table partition_test_partitioned partition(dt=101) select * from src1; show table extended like partition_test_partitioned partition(dt=101); drop table partition_test_partitioned; Partition (dt=101) still points to RCFile, since it was created as a RCFile -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1352) rcfilecat should use '\t' to separate columns and print '\r\n' at the end of each row.
[ https://issues.apache.org/jira/browse/HIVE-1352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12869434#action_12869434 ] Namit Jain commented on HIVE-1352: -- Dont put a TAB at the end of the last column rcfilecat should use '\t' to separate columns and print '\r\n' at the end of each row. -- Key: HIVE-1352 URL: https://issues.apache.org/jira/browse/HIVE-1352 Project: Hadoop Hive Issue Type: Bug Reporter: He Yongqiang Assignee: He Yongqiang Attachments: hive.1352.1.patch Talked to Venky, rcfilecat needs to add column and line delimiters. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1355) Hive should use NullOutputFormat for hadoop jobs
Hive should use NullOutputFormat for hadoop jobs Key: HIVE-1355 URL: https://issues.apache.org/jira/browse/HIVE-1355 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Reporter: Joydeep Sen Sarma see https://issues.apache.org/jira/browse/MAPREDUCE-1802 hive doesn't depend on hadoop job output folder. it produces output exclusively via side effect folders. we should use an outputformat that can request hadoop skip cleanup/setup. this could be nulloutputformat (unless there are any objections in hadoop to changing nulloutputformat behavior). as a small side effect, it also avoids some totally unnecessary hdfs file creates and deletes in hdfs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1352) rcfilecat should use '\t' to separate columns and print '\r\n' at the end of each row.
[ https://issues.apache.org/jira/browse/HIVE-1352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Yongqiang updated HIVE-1352: --- Attachment: hive.1352.2.patch rcfilecat should use '\t' to separate columns and print '\r\n' at the end of each row. -- Key: HIVE-1352 URL: https://issues.apache.org/jira/browse/HIVE-1352 Project: Hadoop Hive Issue Type: Bug Reporter: He Yongqiang Assignee: He Yongqiang Attachments: hive.1352.1.patch, hive.1352.2.patch Talked to Venky, rcfilecat needs to add column and line delimiters. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1356) Allow uncommitted inserts and commit explicitly
Allow uncommitted inserts and commit explicitly --- Key: HIVE-1356 URL: https://issues.apache.org/jira/browse/HIVE-1356 Project: Hadoop Hive Issue Type: Improvement Components: Query Processor Reporter: Raghotham Murthy Uncommitted inserts should not show up in show tables, show partitions etc. We would like to use an explicit commit to make partitions/tables visible after we have inserted all the data that we want. This feature becomes important when there are multi-partition or multi-table inserts. Consumers of the tables/partitions can then wait on just one of the partitions (or a top-level partition) and be certain that they will not start reading a table while it is still being written into. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1352) rcfilecat should use '\t' to separate columns and print '\r\n' at the end of each row.
[ https://issues.apache.org/jira/browse/HIVE-1352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12869481#action_12869481 ] Namit Jain commented on HIVE-1352: -- +1 rcfilecat should use '\t' to separate columns and print '\r\n' at the end of each row. -- Key: HIVE-1352 URL: https://issues.apache.org/jira/browse/HIVE-1352 Project: Hadoop Hive Issue Type: Bug Reporter: He Yongqiang Assignee: He Yongqiang Attachments: hive.1352.1.patch, hive.1352.2.patch Talked to Venky, rcfilecat needs to add column and line delimiters. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.