[jira] Created: (HIVE-1515) archive is not working when multiple partitions inside one table are archived.

2010-08-05 Thread He Yongqiang (JIRA)
archive is not working when multiple partitions inside one table are archived.
--

 Key: HIVE-1515
 URL: https://issues.apache.org/jira/browse/HIVE-1515
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: He Yongqiang


set hive.exec.compress.output = true;
set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
set mapred.min.split.size=256;
set mapred.min.split.size.per.node=256;
set mapred.min.split.size.per.rack=256;
set mapred.max.split.size=256;

set hive.archive.enabled = true;

drop table combine_3_srcpart_seq_rc;

create table combine_3_srcpart_seq_rc (key int , value string) partitioned by 
(ds string, hr string) stored as sequencefile;

insert overwrite table combine_3_srcpart_seq_rc partition (ds="2010-08-03", 
hr="00") select * from src;

insert overwrite table combine_3_srcpart_seq_rc partition (ds="2010-08-03", 
hr="001") select * from src;

ALTER TABLE combine_3_srcpart_seq_rc ARCHIVE PARTITION (ds="2010-08-03", 
hr="00");
ALTER TABLE combine_3_srcpart_seq_rc ARCHIVE PARTITION (ds="2010-08-03", 
hr="001");

select key, value, ds, hr from combine_3_srcpart_seq_rc where ds="2010-08-03" 
order by key, hr limit 30;

drop table combine_3_srcpart_seq_rc;


will fail.

java.io.IOException: Invalid file name: 
har:/data/users/heyongqiang/hive-trunk-clean/build/ql/test/data/warehouse/combine_3_srcpart_seq_rc/ds=2010-08-03/hr=001/data.har/data/users/heyongqiang/hive-trunk-clean/build/ql/test/data/warehouse/combine_3_srcpart_seq_rc/ds=2010-08-03/hr=001
 in 
har:/data/users/heyongqiang/hive-trunk-clean/build/ql/test/data/warehouse/combine_3_srcpart_seq_rc/ds=2010-08-03/hr=00/data.har

The reason it fails is because:
there are 2 input paths (one for each partition) for the above query:
1): 
har:/Users/heyongqiang/Documents/workspace/Hive-Index/build/ql/test/data/warehouse/combine_3_srcpart_seq_rc/ds=2010-08-03/hr=00/data.har/Users/heyongqiang/Documents/workspace/Hive-Index/build/ql/test/data/warehouse/combine_3_srcpart_seq_rc/ds=2010-08-03/hr=00
2): 
har:/Users/heyongqiang/Documents/workspace/Hive-Index/build/ql/test/data/warehouse/combine_3_srcpart_seq_rc/ds=2010-08-03/hr=001/data.har/Users/heyongqiang/Documents/workspace/Hive-Index/build/ql/test/data/warehouse/combine_3_srcpart_seq_rc/ds=2010-08-03/hr=001
But when doing path.getFileSystem() for these 2 input paths. they both return 
same one file system instance which points the first caller, in this case which 
is 
har:/Users/heyongqiang/Documents/workspace/Hive-Index/build/ql/test/data/warehouse/combine_3_srcpart_seq_rc/ds=2010-08-03/hr=00/data.har

The reason here is Hadoop's FileSystem has a global cache, and when trying to 
load a FileSystem instance from a given path, it only take the path's scheme 
and username to lookup the cache. So when we do Path.getFileSystem for the 
second har path, it actually returns the file system handle for the first path.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1513) hive starter scripts should load admin/user supplied script for configurability

2010-08-05 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12895959#action_12895959
 ] 

Joydeep Sen Sarma commented on HIVE-1513:
-

yes - it's possible. however a lot of variables etc. are initialized by the 
time we get to loading ext/*.sh. for example we allow HADOOP_HEAPSIZE to be 
specified via env var. but aside from doing an export before launching the hive 
script, there's no way to configure this externally. the ext/* trick wouldn't 
work cause it's comes too late.

i think this is simple enough - we can just source a conf/hive-env.sh or 
something of the sort so that admins can provide right values for all these 
vars based on their requirements via config files.

> hive starter scripts should load admin/user supplied script for 
> configurability
> ---
>
> Key: HIVE-1513
> URL: https://issues.apache.org/jira/browse/HIVE-1513
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: CLI
>Reporter: Joydeep Sen Sarma
>
> it's difficult to add environment variables to Hive starter scripts except by 
> modifying the scripts directly. this is undesirable (since they are source 
> code). Hive starter scripts should load a admin supplied shell script for 
> configurability. This would be similar to what hadoop does with hadoop-env.sh

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Hive Contributors Meeting August 9th @ Facebook

2010-08-05 Thread Carl Steinbach
Hi,

This is a reminder that the next Hive Contributors Meeting will convene
Monday
August 9th at 4pm at Facebook HQ. Space is limited, so if you plan to
attend you *must* RSVP here:

http://www.meetup.com/Hive-Contributors-Group/

The following is a preliminary agenda for the meeting.
Please email me if you want to add something to the list.

* Update on the 0.6.0 release
* Documentation policies
* Automated patch testing
* Moving to the 0.20 Hadoop API and removing support for pre-0.20 versions
* Updates on recent/continuing work:
 * Howl
 * Indexes
 * Filter pushdown

Thanks.

Carl


[jira] Commented: (HIVE-1512) Need to get hive_hbase-handler to work with hbase versions 0.20.4 0.20.5 and cloudera CDH3 version

2010-08-05 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12895829#action_12895829
 ] 

John Sichi commented on HIVE-1512:
--

This patch can't be applied until we actually upgrade the Hbase libs, since it 
is incompatible with 0.20.3.  I'll link it to HIVE-1235.

Also, when supplying patches, please base them off of hive trunk (not off of a 
subdirectory).

Thanks!


> Need to get hive_hbase-handler to work with hbase versions 0.20.4  0.20.5 and 
> cloudera CDH3 version
> ---
>
> Key: HIVE-1512
> URL: https://issues.apache.org/jira/browse/HIVE-1512
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: HBase Handler
>Affects Versions: 0.7.0
>Reporter: Jimmy Hu
> Fix For: 0.7.0
>
> Attachments: HIVE-1512.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> the current trunk  hive_hbase-handler only works with hbase 0.20.3, we need 
> to get it to work with hbase versions 0.20.4  0.20.5 and cloudera CDH3 version

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HIVE-1512) Need to get hive_hbase-handler to work with hbase versions 0.20.4 0.20.5 and cloudera CDH3 version

2010-08-05 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi reassigned HIVE-1512:


Assignee: John Sichi

> Need to get hive_hbase-handler to work with hbase versions 0.20.4  0.20.5 and 
> cloudera CDH3 version
> ---
>
> Key: HIVE-1512
> URL: https://issues.apache.org/jira/browse/HIVE-1512
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: HBase Handler
>Affects Versions: 0.7.0
>Reporter: Jimmy Hu
>Assignee: John Sichi
> Fix For: 0.7.0
>
> Attachments: HIVE-1512.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> the current trunk  hive_hbase-handler only works with hbase 0.20.3, we need 
> to get it to work with hbase versions 0.20.4  0.20.5 and cloudera CDH3 version

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1514) Be able to modify a partition's fileformat and file location information.

2010-08-05 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1514:
---

Status: Patch Available  (was: Open)

> Be able to modify a partition's fileformat and file location information.
> -
>
> Key: HIVE-1514
> URL: https://issues.apache.org/jira/browse/HIVE-1514
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: hive-1514.1.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1514) Be able to modify a partition's fileformat and file location information.

2010-08-05 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1514:
---

Attachment: hive-1514.1.patch

> Be able to modify a partition's fileformat and file location information.
> -
>
> Key: HIVE-1514
> URL: https://issues.apache.org/jira/browse/HIVE-1514
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: hive-1514.1.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1514) Be able to modify a partition's fileformat and file location information.

2010-08-05 Thread He Yongqiang (JIRA)
Be able to modify a partition's fileformat and file location information.
-

 Key: HIVE-1514
 URL: https://issues.apache.org/jira/browse/HIVE-1514
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: He Yongqiang
Assignee: He Yongqiang




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1374) Query compile-only option

2010-08-05 Thread Siying Dong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12895787#action_12895787
 ] 

Siying Dong commented on HIVE-1374:
---

Communicated with people who asked for this feature. Basically, what they want 
is a mode that checks syntax and doesn't fail when a table or partition doesn't 
exist, or more simply, it doesn't go to metastore to check objects at all. 

> Query compile-only option
> -
>
> Key: HIVE-1374
> URL: https://issues.apache.org/jira/browse/HIVE-1374
> Project: Hadoop Hive
>  Issue Type: New Feature
>Affects Versions: 0.6.0
>Reporter: Paul Yang
>Assignee: Siying Dong
>
> A compile-only option might be useful for helping users quickly prototype 
> queries, fix errors, and do test runs. The proposed change would be adding a 
> -c switch that behaves like -e but only compiles the specified query.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1434) Cassandra Storage Handler

2010-08-05 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1434:
--

Status: Patch Available  (was: Open)

This patch has full read/write functionality. I am going to do another patch 
later today with xdocs, but do not expect any code changes.

> Cassandra Storage Handler
> -
>
> Key: HIVE-1434
> URL: https://issues.apache.org/jira/browse/HIVE-1434
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Attachments: cas-handle.tar.gz, hive-1434-1.txt, 
> hive-1434-2-patch.txt, hive-1434-3-patch.txt
>
>
> Add a cassandra storage handler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1434) Cassandra Storage Handler

2010-08-05 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1434:
--

Attachment: hive-1434-3-patch.txt

> Cassandra Storage Handler
> -
>
> Key: HIVE-1434
> URL: https://issues.apache.org/jira/browse/HIVE-1434
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Attachments: cas-handle.tar.gz, hive-1434-1.txt, 
> hive-1434-2-patch.txt, hive-1434-3-patch.txt
>
>
> Add a cassandra storage handler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1513) hive starter scripts should load admin/user supplied script for configurability

2010-08-05 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12895698#action_12895698
 ] 

Edward Capriolo commented on HIVE-1513:
---

Anything you put in the bin/ext is sourced as part of the bootstrap process. 
Could you do something like bin/ext/mystuff.sh?

> hive starter scripts should load admin/user supplied script for 
> configurability
> ---
>
> Key: HIVE-1513
> URL: https://issues.apache.org/jira/browse/HIVE-1513
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: CLI
>Reporter: Joydeep Sen Sarma
>
> it's difficult to add environment variables to Hive starter scripts except by 
> modifying the scripts directly. this is undesirable (since they are source 
> code). Hive starter scripts should load a admin supplied shell script for 
> configurability. This would be similar to what hadoop does with hadoop-env.sh

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1513) hive starter scripts should load admin/user supplied script for configurability

2010-08-05 Thread Joydeep Sen Sarma (JIRA)
hive starter scripts should load admin/user supplied script for configurability
---

 Key: HIVE-1513
 URL: https://issues.apache.org/jira/browse/HIVE-1513
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: CLI
Reporter: Joydeep Sen Sarma


it's difficult to add environment variables to Hive starter scripts except by 
modifying the scripts directly. this is undesirable (since they are source 
code). Hive starter scripts should load a admin supplied shell script for 
configurability. This would be similar to what hadoop does with hadoop-env.sh

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1509) Monitor the working set of the number of files

2010-08-05 Thread Joydeep Sen Sarma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joydeep Sen Sarma updated HIVE-1509:


Affects Version/s: 0.6.0
   (was: 0.7.0)

> Monitor the working set of the number of files 
> ---
>
> Key: HIVE-1509
> URL: https://issues.apache.org/jira/browse/HIVE-1509
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: Namit Jain
>Assignee: Ning Zhang
> Fix For: 0.7.0
>
> Attachments: HIVE-1509.2.patch, HIVE-1509.3.patch, HIVE-1509.4.patch, 
> HIVE-1509.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1509) Monitor the working set of the number of files

2010-08-05 Thread Joydeep Sen Sarma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joydeep Sen Sarma updated HIVE-1509:


   Status: Resolved  (was: Patch Available)
Fix Version/s: 0.7.0
   Resolution: Fixed

committed - thanks Ning.

it seems that the test problems were likely because there was a problem 
applying the patch.

> Monitor the working set of the number of files 
> ---
>
> Key: HIVE-1509
> URL: https://issues.apache.org/jira/browse/HIVE-1509
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.7.0
>Reporter: Namit Jain
>Assignee: Ning Zhang
> Fix For: 0.7.0
>
> Attachments: HIVE-1509.2.patch, HIVE-1509.3.patch, HIVE-1509.4.patch, 
> HIVE-1509.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.