[jira] Commented: (HIVE-1348) Moving inputFileChanged() from ExecMapper to where it is needed

2010-05-19 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12869246#action_12869246
 ] 

He Yongqiang commented on HIVE-1348:


1.
I changed ExecMapperContext to ExecMapperLocalContext because right now it is 
only used for map joins. But i can revert it because 'ExecMapperContext ' is 
more general and can be used later for other cases.
2. 
Yes. We need to use ExecMapper.memoryMXBean to report memory usage in 
ExecMapperLocalContext. This is because it can be used to track memory usage 
for mapjoin's local work. And once ExecMapper.memoryMXBean is public, it can 
also be used in other places.
3. 
will do it.

 Moving inputFileChanged() from ExecMapper to where it is needed
 ---

 Key: HIVE-1348
 URL: https://issues.apache.org/jira/browse/HIVE-1348
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Ning Zhang
Assignee: He Yongqiang
 Attachments: hive-1348.1.patch


 inputFileChanged() is only needed for Bucketed sort merge map join. It should 
 not be put in ExecMapper.map() where all code paths will hit this function. 
 This function is quite expensive since JobConf look up is a hash table look 
 up. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1348) Moving inputFileChanged() from ExecMapper to where it is needed

2010-05-19 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1348:
---

Attachment: hive-1348.2.patch

 Moving inputFileChanged() from ExecMapper to where it is needed
 ---

 Key: HIVE-1348
 URL: https://issues.apache.org/jira/browse/HIVE-1348
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Ning Zhang
Assignee: He Yongqiang
 Attachments: hive-1348.1.patch, hive-1348.2.patch


 inputFileChanged() is only needed for Bucketed sort merge map join. It should 
 not be put in ExecMapper.map() where all code paths will hit this function. 
 This function is quite expensive since JobConf look up is a hash table look 
 up. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1351) Tool to cat rcfiles

2010-05-19 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12869284#action_12869284
 ] 

Edward Capriolo commented on HIVE-1351:
---

As ning mentioned why move the cli code. If anything more of the code should be 
moving up into the main script rather then into smaller scripts. I see people 
making changes to only the cli. We have to make sure that fixes for things like 
cygwin get propogated to all files, or shared code gets shared.

Also rcfilecat is just a debug util, but it should have a unit test right? Just 
cat to files to make sure it works?

 Tool to cat rcfiles
 ---

 Key: HIVE-1351
 URL: https://issues.apache.org/jira/browse/HIVE-1351
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Namit Jain
Assignee: He Yongqiang
 Attachments: hive.1351.1.patch, hive.1351.2.patch


 It will be useful for debugging

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1351) Tool to cat rcfiles

2010-05-19 Thread Venky Iyer (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12869287#action_12869287
 ] 

Venky Iyer commented on HIVE-1351:
--

I think we'll end up using rcfilecat in a lot of stuff, not just for debugging 
-- (to dump small tables for off-Hive processing). It should be treated as 
production code IMO.

 Tool to cat rcfiles
 ---

 Key: HIVE-1351
 URL: https://issues.apache.org/jira/browse/HIVE-1351
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Namit Jain
Assignee: He Yongqiang
 Attachments: hive.1351.1.patch, hive.1351.2.patch


 It will be useful for debugging

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1348) Moving inputFileChanged() from ExecMapper to where it is needed

2010-05-19 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12869299#action_12869299
 ] 

He Yongqiang commented on HIVE-1348:


1.
We do not want to check the conf  2 times to see if the input file has changed 
or not. So that's why the variable  inputFileChanged is used for.  maybe we 
should give a better name to 'inputFileChanged()'  (  checkInputFileChanged() ?)
2.
i will change the variable name.
3.
No. they will not change the mapjoin behavior. Those code will only be executed 
one time for normal mapjoin.

 Moving inputFileChanged() from ExecMapper to where it is needed
 ---

 Key: HIVE-1348
 URL: https://issues.apache.org/jira/browse/HIVE-1348
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Ning Zhang
Assignee: He Yongqiang
 Attachments: hive-1348.1.patch, hive-1348.2.patch


 inputFileChanged() is only needed for Bucketed sort merge map join. It should 
 not be put in ExecMapper.map() where all code paths will hit this function. 
 This function is quite expensive since JobConf look up is a hash table look 
 up. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1351) Tool to cat rcfiles

2010-05-19 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12869306#action_12869306
 ] 

He Yongqiang commented on HIVE-1351:


I moved almost all cli.sh code to util/execHiveCmd because these code can be 
shared with rcfilecat. i think these code should be independent (we may add new 
command in future.)  

 Tool to cat rcfiles
 ---

 Key: HIVE-1351
 URL: https://issues.apache.org/jira/browse/HIVE-1351
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Namit Jain
Assignee: He Yongqiang
 Fix For: 0.6.0

 Attachments: hive.1351.1.patch, hive.1351.2.patch


 It will be useful for debugging

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1352) rcfilecat should use '\t' to separate columns and print '\r\n' at the end of each row.

2010-05-19 Thread He Yongqiang (JIRA)
rcfilecat should use '\t' to separate columns and print '\r\n' at the end of 
each row.
--

 Key: HIVE-1352
 URL: https://issues.apache.org/jira/browse/HIVE-1352
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: He Yongqiang
Assignee: He Yongqiang


Talked to Venky, rcfilecat needs to add column and line delimiters. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1352) rcfilecat should use '\t' to separate columns and print '\r\n' at the end of each row.

2010-05-19 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1352:
---

Attachment: hive.1352.1.patch

 rcfilecat should use '\t' to separate columns and print '\r\n' at the end of 
 each row.
 --

 Key: HIVE-1352
 URL: https://issues.apache.org/jira/browse/HIVE-1352
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: He Yongqiang
Assignee: He Yongqiang
 Attachments: hive.1352.1.patch


 Talked to Venky, rcfilecat needs to add column and line delimiters. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1350) hive.query.id is not unique

2010-05-19 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12869323#action_12869323
 ] 

John Sichi commented on HIVE-1350:
--

+1.  Will commit if tests pass.


 hive.query.id is not unique 
 

 Key: HIVE-1350
 URL: https://issues.apache.org/jira/browse/HIVE-1350
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.1350.1.patch, hive.1350.2.patch


 if commands are executed by the same user within a second

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1352) rcfilecat should use '\t' to separate columns and print '\r\n' at the end of each row.

2010-05-19 Thread Venky Iyer (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12869327#action_12869327
 ] 

Venky Iyer commented on HIVE-1352:
--

why '\r\n'?

 rcfilecat should use '\t' to separate columns and print '\r\n' at the end of 
 each row.
 --

 Key: HIVE-1352
 URL: https://issues.apache.org/jira/browse/HIVE-1352
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: He Yongqiang
Assignee: He Yongqiang
 Attachments: hive.1352.1.patch


 Talked to Venky, rcfilecat needs to add column and line delimiters. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1353) load_dyn_part*.q tests need ORDER BY for determinism

2010-05-19 Thread John Sichi (JIRA)
load_dyn_part*.q tests need ORDER BY for determinism


 Key: HIVE-1353
 URL: https://issues.apache.org/jira/browse/HIVE-1353
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: John Sichi
Assignee: Ning Zhang
 Fix For: 0.6.0


Just now got a spurious failure from this while testing something else.

[junit] diff -a -I file: -I /tmp/ -I invalidscheme: -I lastUpdateTime -I la\
stAccessTime -I owner -I transient_lastDdlTime -I java.lang.RuntimeException -I\
 at org -I at sun -I at java -I at junit -I Caused by: -I [.][.][.] [0-9]* more\
 /data/users/jsichi/open/commit-trunk/.ptest_0/build/ql/test/logs/clientpositiv\
e/load_dyn_part14.q.out /data/users/jsichi/open/commit-trunk/.ptest_0/ql/src/te\
st/results/clientpositive/load_dyn_part14.q.out 
[junit] 261,262d260 
[junit]  k1__HIVE_DEFAULT_PARTITION__  
[junit]  k1__HIVE_DEFAULT_PARTITION__  
[junit] 264a263,264 
[junit]  k1__HIVE_DEFAULT_PARTITION__  
[junit]  k1__HIVE_DEFAULT_PARTITION__  
[junit] Exception: Client execution results failed with error code = 1  


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1351) Tool to cat rcfiles

2010-05-19 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12869344#action_12869344
 ] 

Edward Capriolo commented on HIVE-1351:
---

This is so notpicky, but
{noformat}
+--rcfilecat)
+  SERVICE=rcfilecat
+  shift
+  ;;
{noformat}

I do not think we should do this. We are just giving alternate invocations that 
end up being more confusing.

Why should you be able to do this:
{noformat}
hive --rcfilecat
{noformat}

but not
{noformat} 
hive --hwi
{noformat}
?

as for execHiveCmd. If you want to share this why not move it up into bin/hive? 
We do not need to add a file to shared when subs specified in in bin/hive are 
already shared.  

 Tool to cat rcfiles
 ---

 Key: HIVE-1351
 URL: https://issues.apache.org/jira/browse/HIVE-1351
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Namit Jain
Assignee: He Yongqiang
 Fix For: 0.6.0

 Attachments: hive.1351.1.patch, hive.1351.2.patch


 It will be useful for debugging

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1354) partition level properties honored if it exists

2010-05-19 Thread Namit Jain (JIRA)
partition level properties honored if it exists
---

 Key: HIVE-1354
 URL: https://issues.apache.org/jira/browse/HIVE-1354
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain


drop table partition_test_partitioned;

create table partition_test_partitioned(key string, value string) partitioned 
by (dt string);

alter table partition_test_partitioned set fileformat rcfile;
insert overwrite table partition_test_partitioned partition(dt=101) select * 
from src1;
show table extended like partition_test_partitioned partition(dt=101);

alter table partition_test_partitioned set fileformat Sequencefile;
insert overwrite table partition_test_partitioned partition(dt=102) select * 
from src1;
show table extended like partition_test_partitioned partition(dt=102);

insert overwrite table partition_test_partitioned partition(dt=101) select * 
from src1;
show table extended like partition_test_partitioned partition(dt=101);

drop table partition_test_partitioned;


Partition (dt=101) still points to RCFile, since it was created as a RCFile

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1352) rcfilecat should use '\t' to separate columns and print '\r\n' at the end of each row.

2010-05-19 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12869434#action_12869434
 ] 

Namit Jain commented on HIVE-1352:
--

Dont put a TAB at the end of the last column

 rcfilecat should use '\t' to separate columns and print '\r\n' at the end of 
 each row.
 --

 Key: HIVE-1352
 URL: https://issues.apache.org/jira/browse/HIVE-1352
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: He Yongqiang
Assignee: He Yongqiang
 Attachments: hive.1352.1.patch


 Talked to Venky, rcfilecat needs to add column and line delimiters. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1355) Hive should use NullOutputFormat for hadoop jobs

2010-05-19 Thread Joydeep Sen Sarma (JIRA)
Hive should use NullOutputFormat for hadoop jobs


 Key: HIVE-1355
 URL: https://issues.apache.org/jira/browse/HIVE-1355
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Joydeep Sen Sarma


see https://issues.apache.org/jira/browse/MAPREDUCE-1802

hive doesn't depend on hadoop job output folder. it produces output exclusively 
via side effect folders. we should use an outputformat that can request hadoop 
skip cleanup/setup. this could be nulloutputformat (unless there are any 
objections in hadoop to changing nulloutputformat behavior).

as a small side effect, it also avoids some totally unnecessary hdfs file 
creates and deletes in hdfs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1352) rcfilecat should use '\t' to separate columns and print '\r\n' at the end of each row.

2010-05-19 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1352:
---

Attachment: hive.1352.2.patch

 rcfilecat should use '\t' to separate columns and print '\r\n' at the end of 
 each row.
 --

 Key: HIVE-1352
 URL: https://issues.apache.org/jira/browse/HIVE-1352
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: He Yongqiang
Assignee: He Yongqiang
 Attachments: hive.1352.1.patch, hive.1352.2.patch


 Talked to Venky, rcfilecat needs to add column and line delimiters. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1356) Allow uncommitted inserts and commit explicitly

2010-05-19 Thread Raghotham Murthy (JIRA)
Allow uncommitted inserts and commit explicitly
---

 Key: HIVE-1356
 URL: https://issues.apache.org/jira/browse/HIVE-1356
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Raghotham Murthy


Uncommitted inserts should not show up in show tables, show partitions etc. We 
would like to use an explicit commit to make partitions/tables visible after we 
have inserted all the data that we want. This feature becomes important when 
there are multi-partition or multi-table inserts. Consumers of the 
tables/partitions can then wait on just one of the partitions (or a top-level 
partition) and be certain that they will not start reading a table while it is 
still being written into.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1352) rcfilecat should use '\t' to separate columns and print '\r\n' at the end of each row.

2010-05-19 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12869481#action_12869481
 ] 

Namit Jain commented on HIVE-1352:
--

+1


 rcfilecat should use '\t' to separate columns and print '\r\n' at the end of 
 each row.
 --

 Key: HIVE-1352
 URL: https://issues.apache.org/jira/browse/HIVE-1352
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: He Yongqiang
Assignee: He Yongqiang
 Attachments: hive.1352.1.patch, hive.1352.2.patch


 Talked to Venky, rcfilecat needs to add column and line delimiters. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.