[jira] Updated: (HIVE-1408) add option to let hive automatically run in local mode based on tunable heuristics
[ https://issues.apache.org/jira/browse/HIVE-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joydeep Sen Sarma updated HIVE-1408: Attachment: 1408.7.patch add option to let hive automatically run in local mode based on tunable heuristics -- Key: HIVE-1408 URL: https://issues.apache.org/jira/browse/HIVE-1408 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Joydeep Sen Sarma Assignee: Joydeep Sen Sarma Attachments: 1408.1.patch, 1408.2.patch, 1408.2.q.out.patch, 1408.7.patch, hive-1408.6.patch as a followup to HIVE-543 - we should have a simple option (enabled by default) to let hive run in local mode if possible. two levels of options are desirable: 1. hive.exec.mode.local.auto=true/false // control whether local mode is automatically chosen 2. Options to control different heuristics, some naiive examples: hive.exec.mode.local.auto.input.size.max=1G // don't choose local mode if data 1G hive.exec.mode.local.auto.script.enable=true/false // choose if local mode is enabled for queries with user scripts this can be implemented as a pre/post execution hook. It makes sense to provide this as a standard hook in the hive codebase since it's likely to improve response time for many users (especially for test queries). the initial proposal is to choose this at a query level and not at per hive-task (ie. hadoop job) level. per job-level requires more changes to compilation (to not pre-commit to hdfs or local scratch directories at compile time). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1408) add option to let hive automatically run in local mode based on tunable heuristics
[ https://issues.apache.org/jira/browse/HIVE-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12893088#action_12893088 ] Joydeep Sen Sarma commented on HIVE-1408: - #1 - we decide that i would try to take out ProxyFileSystem from the hive jars in the distribution. unfortunately, i am unable to do so - all the simple ways seem to break the tests. i don't see much of a downside with the current arrangement - ProxyFileSystem is test-only code - there's no reason why anyone should invoke this. so shouldn't cause any problems (even though it ships with the hive jars). the pfile:// - ProxyFileSystem mapping exists only in test mode. btw - i can't use ShimLoader - because Hadoop doesn't specify a factory class for creating file system object. it expects a file system class directly. that makes it impossible to write a portable filesystem class using the shimloader paradigm. i am beginning to appreciate factory classes more. #2 not an issue - can't use ShimLoader as per above. #3 fixed #4, #5, #6, #7, #8 - not an issue as we discussed. HIVE-1484 has already been filed as a followup work to use local dir for intermediate data when possible #9 - fixed. moved one public func to Utility.java and eliminated the other. add option to let hive automatically run in local mode based on tunable heuristics -- Key: HIVE-1408 URL: https://issues.apache.org/jira/browse/HIVE-1408 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Joydeep Sen Sarma Assignee: Joydeep Sen Sarma Attachments: 1408.1.patch, 1408.2.patch, 1408.2.q.out.patch, 1408.7.patch, hive-1408.6.patch as a followup to HIVE-543 - we should have a simple option (enabled by default) to let hive run in local mode if possible. two levels of options are desirable: 1. hive.exec.mode.local.auto=true/false // control whether local mode is automatically chosen 2. Options to control different heuristics, some naiive examples: hive.exec.mode.local.auto.input.size.max=1G // don't choose local mode if data 1G hive.exec.mode.local.auto.script.enable=true/false // choose if local mode is enabled for queries with user scripts this can be implemented as a pre/post execution hook. It makes sense to provide this as a standard hook in the hive codebase since it's likely to improve response time for many users (especially for test queries). the initial proposal is to choose this at a query level and not at per hive-task (ie. hadoop job) level. per job-level requires more changes to compilation (to not pre-commit to hdfs or local scratch directories at compile time). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1126) Missing some Jdbc functionality like getTables getColumns and HiveResultSet.get* methods based on column name.
[ https://issues.apache.org/jira/browse/HIVE-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12893170#action_12893170 ] Bennie Schut commented on HIVE-1126: I keep getting errors on my test run on the test: testCliDriver_loadpart_err which seem unrelated to my changes. Missing some Jdbc functionality like getTables getColumns and HiveResultSet.get* methods based on column name. -- Key: HIVE-1126 URL: https://issues.apache.org/jira/browse/HIVE-1126 Project: Hadoop Hive Issue Type: Improvement Components: Clients Reporter: Bennie Schut Assignee: Bennie Schut Fix For: 0.7.0 Attachments: HIVE-1126-1.patch, HIVE-1126-2.patch, HIVE-1126-3.patch, HIVE-1126-4.patch, HIVE-1126-5.patch, HIVE-1126-6.patch, HIVE-1126.patch, HIVE-1126_patch(0.5.0_source).patch I've been using the hive jdbc driver more and more and was missing some functionality which I added HiveDatabaseMetaData.getTables Using show tables to get the info from hive. HiveDatabaseMetaData.getColumns Using describe tablename to get the columns. This makes using something like SQuirreL a lot nicer since you have the list of tables and just click on the content tab to see what's in the table. I also implemented HiveResultSet.getObject(String columnName) so you call most get* methods based on the column name. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1126) Missing some Jdbc functionality like getTables getColumns and HiveResultSet.get* methods based on column name.
[ https://issues.apache.org/jira/browse/HIVE-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12893175#action_12893175 ] Amr Awadallah commented on HIVE-1126: - I am out of office on vacation and will be slower than usual in responding to emails. If this is urgent then please call my cell phone (or send an sms), otherwise I will reply to your email when I get back. Thanks for your patience, -- amr Missing some Jdbc functionality like getTables getColumns and HiveResultSet.get* methods based on column name. -- Key: HIVE-1126 URL: https://issues.apache.org/jira/browse/HIVE-1126 Project: Hadoop Hive Issue Type: Improvement Components: Clients Reporter: Bennie Schut Assignee: Bennie Schut Fix For: 0.7.0 Attachments: HIVE-1126-1.patch, HIVE-1126-2.patch, HIVE-1126-3.patch, HIVE-1126-4.patch, HIVE-1126-5.patch, HIVE-1126-6.patch, HIVE-1126.patch, HIVE-1126_patch(0.5.0_source).patch I've been using the hive jdbc driver more and more and was missing some functionality which I added HiveDatabaseMetaData.getTables Using show tables to get the info from hive. HiveDatabaseMetaData.getColumns Using describe tablename to get the columns. This makes using something like SQuirreL a lot nicer since you have the list of tables and just click on the content tab to see what's in the table. I also implemented HiveResultSet.getObject(String columnName) so you call most get* methods based on the column name. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Build failed in Hudson: Hive-trunk-h0.17 #505
See http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/505/changes Changes: [nzhang] HIVE-1425. hive.task.progress should be added to conf/hive-default.xml (John Sichi via Ning Zhang) -- [...truncated 9371 lines...] [junit] POSTHOOK: Output: defa...@srcbucket2 [junit] OK [junit] Loading data to table srcbucket2 [junit] POSTHOOK: Output: defa...@srcbucket2 [junit] OK [junit] Loading data to table src [junit] POSTHOOK: Output: defa...@src [junit] OK [junit] Loading data to table src1 [junit] POSTHOOK: Output: defa...@src1 [junit] OK [junit] Loading data to table src_sequencefile [junit] POSTHOOK: Output: defa...@src_sequencefile [junit] OK [junit] Loading data to table src_thrift [junit] POSTHOOK: Output: defa...@src_thrift [junit] OK [junit] Loading data to table src_json [junit] diff http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/build/ql/test/logs/negative/unknown_function4.q.out http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/ql/src/test/results/compiler/errors/unknown_function4.q.out [junit] Done query: unknown_function4.q [junit] POSTHOOK: Output: defa...@src_json [junit] OK [junit] Begin query: unknown_table1.q [junit] Loading data to table srcpart partition (ds=2008-04-08, hr=11) [junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=11 [junit] OK [junit] Loading data to table srcpart partition (ds=2008-04-08, hr=12) [junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=12 [junit] OK [junit] Loading data to table srcpart partition (ds=2008-04-09, hr=11) [junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=11 [junit] OK [junit] Loading data to table srcpart partition (ds=2008-04-09, hr=12) [junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=12 [junit] OK [junit] POSTHOOK: Output: defa...@srcbucket [junit] OK [junit] Loading data to table srcbucket [junit] POSTHOOK: Output: defa...@srcbucket [junit] OK [junit] Loading data to table srcbucket [junit] POSTHOOK: Output: defa...@srcbucket [junit] OK [junit] POSTHOOK: Output: defa...@srcbucket2 [junit] OK [junit] Loading data to table srcbucket2 [junit] POSTHOOK: Output: defa...@srcbucket2 [junit] OK [junit] Loading data to table srcbucket2 [junit] POSTHOOK: Output: defa...@srcbucket2 [junit] OK [junit] Loading data to table srcbucket2 [junit] POSTHOOK: Output: defa...@srcbucket2 [junit] OK [junit] Loading data to table srcbucket2 [junit] POSTHOOK: Output: defa...@srcbucket2 [junit] OK [junit] Loading data to table src [junit] POSTHOOK: Output: defa...@src [junit] OK [junit] Loading data to table src1 [junit] POSTHOOK: Output: defa...@src1 [junit] OK [junit] Loading data to table src_sequencefile [junit] POSTHOOK: Output: defa...@src_sequencefile [junit] OK [junit] Loading data to table src_thrift [junit] POSTHOOK: Output: defa...@src_thrift [junit] OK [junit] Loading data to table src_json [junit] POSTHOOK: Output: defa...@src_json [junit] OK [junit] diff http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/build/ql/test/logs/negative/unknown_table1.q.out http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/ql/src/test/results/compiler/errors/unknown_table1.q.out [junit] Done query: unknown_table1.q [junit] Begin query: unknown_table2.q [junit] Loading data to table srcpart partition (ds=2008-04-08, hr=11) [junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=11 [junit] OK [junit] Loading data to table srcpart partition (ds=2008-04-08, hr=12) [junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=12 [junit] OK [junit] Loading data to table srcpart partition (ds=2008-04-09, hr=11) [junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=11 [junit] OK [junit] Loading data to table srcpart partition (ds=2008-04-09, hr=12) [junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=12 [junit] OK [junit] POSTHOOK: Output: defa...@srcbucket [junit] OK [junit] Loading data to table srcbucket [junit] POSTHOOK: Output: defa...@srcbucket [junit] OK [junit] Loading data to table srcbucket [junit] POSTHOOK: Output: defa...@srcbucket [junit] OK [junit] POSTHOOK: Output: defa...@srcbucket2 [junit] OK [junit] Loading data to table srcbucket2 [junit] POSTHOOK: Output: defa...@srcbucket2 [junit] OK [junit] Loading data to table srcbucket2 [junit] POSTHOOK: Output: defa...@srcbucket2 [junit] OK [junit] Loading data to table srcbucket2 [junit] POSTHOOK: Output: defa...@srcbucket2 [junit] OK [junit] Loading data to table srcbucket2 [junit] POSTHOOK: Output: defa...@srcbucket2 [junit] OK
Build failed in Hudson: Hive-trunk-h0.19 #507
See http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.19/507/changes Changes: [nzhang] HIVE-1425. hive.task.progress should be added to conf/hive-default.xml (John Sichi via Ning Zhang) -- [...truncated 12080 lines...] [junit] OK [junit] Loading data to table srcbucket2 [junit] POSTHOOK: Output: defa...@srcbucket2 [junit] OK [junit] Loading data to table srcbucket2 [junit] POSTHOOK: Output: defa...@srcbucket2 [junit] OK [junit] Loading data to table src [junit] POSTHOOK: Output: defa...@src [junit] OK [junit] Loading data to table src1 [junit] POSTHOOK: Output: defa...@src1 [junit] OK [junit] Loading data to table src_sequencefile [junit] POSTHOOK: Output: defa...@src_sequencefile [junit] OK [junit] Loading data to table src_thrift [junit] POSTHOOK: Output: defa...@src_thrift [junit] OK [junit] Loading data to table src_json [junit] POSTHOOK: Output: defa...@src_json [junit] OK [junit] diff http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.19/ws/hive/build/ql/test/logs/negative/unknown_function4.q.out http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.19/ws/hive/ql/src/test/results/compiler/errors/unknown_function4.q.out [junit] Done query: unknown_function4.q [junit] Begin query: unknown_table1.q [junit] Loading data to table srcpart partition (ds=2008-04-08, hr=11) [junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=11 [junit] OK [junit] Loading data to table srcpart partition (ds=2008-04-08, hr=12) [junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=12 [junit] OK [junit] Loading data to table srcpart partition (ds=2008-04-09, hr=11) [junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=11 [junit] OK [junit] Loading data to table srcpart partition (ds=2008-04-09, hr=12) [junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=12 [junit] OK [junit] POSTHOOK: Output: defa...@srcbucket [junit] OK [junit] Loading data to table srcbucket [junit] POSTHOOK: Output: defa...@srcbucket [junit] OK [junit] Loading data to table srcbucket [junit] POSTHOOK: Output: defa...@srcbucket [junit] OK [junit] POSTHOOK: Output: defa...@srcbucket2 [junit] OK [junit] Loading data to table srcbucket2 [junit] POSTHOOK: Output: defa...@srcbucket2 [junit] OK [junit] Loading data to table srcbucket2 [junit] POSTHOOK: Output: defa...@srcbucket2 [junit] OK [junit] Loading data to table srcbucket2 [junit] POSTHOOK: Output: defa...@srcbucket2 [junit] OK [junit] Loading data to table srcbucket2 [junit] POSTHOOK: Output: defa...@srcbucket2 [junit] OK [junit] Loading data to table src [junit] POSTHOOK: Output: defa...@src [junit] OK [junit] Loading data to table src1 [junit] POSTHOOK: Output: defa...@src1 [junit] OK [junit] Loading data to table src_sequencefile [junit] POSTHOOK: Output: defa...@src_sequencefile [junit] OK [junit] Loading data to table src_thrift [junit] POSTHOOK: Output: defa...@src_thrift [junit] OK [junit] Loading data to table src_json [junit] POSTHOOK: Output: defa...@src_json [junit] OK [junit] diff http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.19/ws/hive/build/ql/test/logs/negative/unknown_table1.q.out http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.19/ws/hive/ql/src/test/results/compiler/errors/unknown_table1.q.out [junit] Done query: unknown_table1.q [junit] Begin query: unknown_table2.q [junit] Loading data to table srcpart partition (ds=2008-04-08, hr=11) [junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=11 [junit] OK [junit] Loading data to table srcpart partition (ds=2008-04-08, hr=12) [junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=12 [junit] OK [junit] Loading data to table srcpart partition (ds=2008-04-09, hr=11) [junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=11 [junit] OK [junit] Loading data to table srcpart partition (ds=2008-04-09, hr=12) [junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=12 [junit] OK [junit] POSTHOOK: Output: defa...@srcbucket [junit] OK [junit] Loading data to table srcbucket [junit] POSTHOOK: Output: defa...@srcbucket [junit] OK [junit] Loading data to table srcbucket [junit] POSTHOOK: Output: defa...@srcbucket [junit] OK [junit] POSTHOOK: Output: defa...@srcbucket2 [junit] OK [junit] Loading data to table srcbucket2 [junit] POSTHOOK: Output: defa...@srcbucket2 [junit] OK [junit] Loading data to table srcbucket2 [junit] POSTHOOK: Output: defa...@srcbucket2 [junit] OK [junit] Loading data to table srcbucket2 [junit] POSTHOOK: Output: defa...@srcbucket2 [junit] OK [junit] Loading data to table srcbucket2
[jira] Commented: (HIVE-1126) Missing some Jdbc functionality like getTables getColumns and HiveResultSet.get* methods based on column name.
[ https://issues.apache.org/jira/browse/HIVE-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12893275#action_12893275 ] John Sichi commented on HIVE-1126: -- @Bennie: yeah, it is flaking for me too. It has been flaky forever but seems to have gotten worse for me recently. I've logged HIVE-1491 to disable it, but until we get that done, you can just delete loadpart_err.q and loadpart_err.q.out before running ant package test. Missing some Jdbc functionality like getTables getColumns and HiveResultSet.get* methods based on column name. -- Key: HIVE-1126 URL: https://issues.apache.org/jira/browse/HIVE-1126 Project: Hadoop Hive Issue Type: Improvement Components: Clients Reporter: Bennie Schut Assignee: Bennie Schut Fix For: 0.7.0 Attachments: HIVE-1126-1.patch, HIVE-1126-2.patch, HIVE-1126-3.patch, HIVE-1126-4.patch, HIVE-1126-5.patch, HIVE-1126-6.patch, HIVE-1126.patch, HIVE-1126_patch(0.5.0_source).patch I've been using the hive jdbc driver more and more and was missing some functionality which I added HiveDatabaseMetaData.getTables Using show tables to get the info from hive. HiveDatabaseMetaData.getColumns Using describe tablename to get the columns. This makes using something like SQuirreL a lot nicer since you have the list of tables and just click on the content tab to see what's in the table. I also implemented HiveResultSet.getObject(String columnName) so you call most get* methods based on the column name. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1408) add option to let hive automatically run in local mode based on tunable heuristics
[ https://issues.apache.org/jira/browse/HIVE-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12893291#action_12893291 ] Ning Zhang commented on HIVE-1408: -- Looks good in general. One minor thing though: I tried it on real clusters and it works great except that I need to manually set mapred.local.dir even though hive.exec.mode.local.auto is already set to true. Should we treat mapred.local.dir the same as HADOOPJT so that it can be set automatically when local mode is on and reset it back in Driver and Context? add option to let hive automatically run in local mode based on tunable heuristics -- Key: HIVE-1408 URL: https://issues.apache.org/jira/browse/HIVE-1408 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Joydeep Sen Sarma Assignee: Joydeep Sen Sarma Attachments: 1408.1.patch, 1408.2.patch, 1408.2.q.out.patch, 1408.7.patch, hive-1408.6.patch as a followup to HIVE-543 - we should have a simple option (enabled by default) to let hive run in local mode if possible. two levels of options are desirable: 1. hive.exec.mode.local.auto=true/false // control whether local mode is automatically chosen 2. Options to control different heuristics, some naiive examples: hive.exec.mode.local.auto.input.size.max=1G // don't choose local mode if data 1G hive.exec.mode.local.auto.script.enable=true/false // choose if local mode is enabled for queries with user scripts this can be implemented as a pre/post execution hook. It makes sense to provide this as a standard hook in the hive codebase since it's likely to improve response time for many users (especially for test queries). the initial proposal is to choose this at a query level and not at per hive-task (ie. hadoop job) level. per job-level requires more changes to compilation (to not pre-commit to hdfs or local scratch directories at compile time). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1126) Missing some Jdbc functionality like getTables getColumns and HiveResultSet.get* methods based on column name.
[ https://issues.apache.org/jira/browse/HIVE-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bennie Schut updated HIVE-1126: --- Attachment: HIVE-1126-7.patch New patch with fixed test. Also switched the actual/expected values so they are now correct plus added some messages which should make any failing test more clear. Missing some Jdbc functionality like getTables getColumns and HiveResultSet.get* methods based on column name. -- Key: HIVE-1126 URL: https://issues.apache.org/jira/browse/HIVE-1126 Project: Hadoop Hive Issue Type: Improvement Components: Clients Reporter: Bennie Schut Assignee: Bennie Schut Fix For: 0.7.0 Attachments: HIVE-1126-1.patch, HIVE-1126-2.patch, HIVE-1126-3.patch, HIVE-1126-4.patch, HIVE-1126-5.patch, HIVE-1126-6.patch, HIVE-1126-7.patch, HIVE-1126.patch, HIVE-1126_patch(0.5.0_source).patch I've been using the hive jdbc driver more and more and was missing some functionality which I added HiveDatabaseMetaData.getTables Using show tables to get the info from hive. HiveDatabaseMetaData.getColumns Using describe tablename to get the columns. This makes using something like SQuirreL a lot nicer since you have the list of tables and just click on the content tab to see what's in the table. I also implemented HiveResultSet.getObject(String columnName) so you call most get* methods based on the column name. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1126) Missing some Jdbc functionality like getTables getColumns and HiveResultSet.get* methods based on column name.
[ https://issues.apache.org/jira/browse/HIVE-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bennie Schut updated HIVE-1126: --- Status: Patch Available (was: Open) Missing some Jdbc functionality like getTables getColumns and HiveResultSet.get* methods based on column name. -- Key: HIVE-1126 URL: https://issues.apache.org/jira/browse/HIVE-1126 Project: Hadoop Hive Issue Type: Improvement Components: Clients Reporter: Bennie Schut Assignee: Bennie Schut Fix For: 0.7.0 Attachments: HIVE-1126-1.patch, HIVE-1126-2.patch, HIVE-1126-3.patch, HIVE-1126-4.patch, HIVE-1126-5.patch, HIVE-1126-6.patch, HIVE-1126-7.patch, HIVE-1126.patch, HIVE-1126_patch(0.5.0_source).patch I've been using the hive jdbc driver more and more and was missing some functionality which I added HiveDatabaseMetaData.getTables Using show tables to get the info from hive. HiveDatabaseMetaData.getColumns Using describe tablename to get the columns. This makes using something like SQuirreL a lot nicer since you have the list of tables and just click on the content tab to see what's in the table. I also implemented HiveResultSet.getObject(String columnName) so you call most get* methods based on the column name. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1408) add option to let hive automatically run in local mode based on tunable heuristics
[ https://issues.apache.org/jira/browse/HIVE-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12893314#action_12893314 ] Joydeep Sen Sarma commented on HIVE-1408: - yeah - so the solution is that the mapred.local.dir needs to be set correctly in hive/hadoop client side xml. for our internal install - i will send a diff changing the client side to point to /tmp (instead of having server side config). there's nothing to do on the hive open source version. mapred.local.dir is a client only variable and needs to be set specific to the client side by the admin. basically our internal client side config has a bug :-) add option to let hive automatically run in local mode based on tunable heuristics -- Key: HIVE-1408 URL: https://issues.apache.org/jira/browse/HIVE-1408 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Joydeep Sen Sarma Assignee: Joydeep Sen Sarma Attachments: 1408.1.patch, 1408.2.patch, 1408.2.q.out.patch, 1408.7.patch, hive-1408.6.patch as a followup to HIVE-543 - we should have a simple option (enabled by default) to let hive run in local mode if possible. two levels of options are desirable: 1. hive.exec.mode.local.auto=true/false // control whether local mode is automatically chosen 2. Options to control different heuristics, some naiive examples: hive.exec.mode.local.auto.input.size.max=1G // don't choose local mode if data 1G hive.exec.mode.local.auto.script.enable=true/false // choose if local mode is enabled for queries with user scripts this can be implemented as a pre/post execution hook. It makes sense to provide this as a standard hook in the hive codebase since it's likely to improve response time for many users (especially for test queries). the initial proposal is to choose this at a query level and not at per hive-task (ie. hadoop job) level. per job-level requires more changes to compilation (to not pre-commit to hdfs or local scratch directories at compile time). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-417) Implement Indexing in Hive
[ https://issues.apache.org/jira/browse/HIVE-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1289#action_1289 ] John Sichi commented on HIVE-417: - Thanks Yongqiang. Looking at it now. Implement Indexing in Hive -- Key: HIVE-417 URL: https://issues.apache.org/jira/browse/HIVE-417 Project: Hadoop Hive Issue Type: New Feature Components: Metastore, Query Processor Affects Versions: 0.3.0, 0.3.1, 0.4.0, 0.6.0 Reporter: Prasad Chakka Assignee: He Yongqiang Attachments: hive-417.proto.patch, hive-417-2009-07-18.patch, hive-indexing-8-thrift-metastore-remodel.patch, hive-indexing.3.patch, hive-indexing.5.thrift.patch, hive.indexing.11.patch, hive.indexing.12.patch, idx2.png, indexing_with_ql_rewrites_trunk_953221.patch Implement indexing on Hive so that lookup and range queries are efficient. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1434) Cassandra Storage Handler
[ https://issues.apache.org/jira/browse/HIVE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-1434: -- Attachment: cas-handle.tar.gz This is not a quality patch yet. I am still experimenting with some ideas. Everying is free form and will likely change before the final patch. There are a few junk files (HiveIColumn,etc) which will not be part of the release. Thus far: CassandraSplit.java HiveCassandraTableInputFormat.java CassandraSerDe.java TestColumnFamilyInputFormat.java TestCassandraPut.java TestColumnFamilyInputFormat.java Are working and can give you an idea of where the code is going. Cassandra Storage Handler - Key: HIVE-1434 URL: https://issues.apache.org/jira/browse/HIVE-1434 Project: Hadoop Hive Issue Type: New Feature Reporter: Edward Capriolo Assignee: Edward Capriolo Attachments: cas-handle.tar.gz, hive-1434-1.txt Add a cassandra storage handler. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Fwd: [howldev] Initial thoughts on authorization in howl
Begin forwarded message: From: Pradeep Kamath prade...@yahoo-inc.commailto:prade...@yahoo-inc.com Date: July 27, 2010 4:38:42 PM PDT To: howl...@yahoogroups.commailto:howl...@yahoogroups.com Subject: [howldev] Initial thoughts on authorization in howl Reply-To: howl...@yahoogroups.commailto:howl...@yahoogroups.com The initial thoughts on authorization in howl are to model authorization (for DDL ops like create table/drop table/add partition etc) after hdfs permissions. To be able to do this, we would like to extend createTable() to add the ability to record a different group from the user’s primary group and to record the complete unix permissions on the table directory. Also, we would like to have a way for partition directories to inherit permissions and group information based on the table directory. To keep the metastore backward compatible for use with hive, I propose having conf variables to achieve these objectives: - table.group.namehttp://table.group.name – value will indicate the name of the unix group for the table directory. This will be used by createTable() to perform a chgrp to the value provided. This property will provide the user the ability to choose from one of the many unix groups he is part of to associate with the table. - table.permissions – value will be of the form rwxrwxrwx to indicate read-write-execute permissions on the table directory. This will be used by createTable() to perform a chmod to the value provided. This will let the user decide what permissions he wants on the table. - partitions.inherit.permissions – a value of true will indicate that partitions inherit the group name and permissions of the table level directory. This will be used by addPartition() to perform a chgrp and chmod to the values as on the table directory. I favor conf properties over API changes since the complete authorization design for hive is not finalized yet. These properties can be deprecated/removed when that is in place. These properties would also be useful to some installation of vanilla hive since at least DFS level authorization can now be achieved by hive without the user having to manually perform chgrp and chmod operations on DFS. I would like to hear from hive developers/committers whether this would be acceptable for hive and also thoughts from others. Pradeep __._,_.___ Your email settings: Individual Email|Traditional Change settings via the Webhttp://groups.yahoo.com/group/howldev/join;_ylc=X3oDMTJnZXE5ZHNwBF9TAzk3NDc2NTkwBGdycElkAzYzNDIwNTA4BGdycHNwSWQDMTcwNzI4MTk0MgRzZWMDZnRyBHNsawNzdG5ncwRzdGltZQMxMjgwMjczOTQ2 (Yahoo! ID required) Change settings via email: Switch delivery to Daily Digestmailto:howldev-dig...@yahoogroups.com?subject=email%20delivery:%20Digest | Switch to Fully Featuredmailto:howldev-fullfeatu...@yahoogroups.com?subject=change%20delivery%20format:%20Fully%20Featured Visit Your Group http://groups.yahoo.com/group/howldev;_ylc=X3oDMTJlOWw0Y3F0BF9TAzk3NDc2NTkwBGdycElkAzYzNDIwNTA4BGdycHNwSWQDMTcwNzI4MTk0MgRzZWMDZnRyBHNsawNocGYEc3RpbWUDMTI4MDI3Mzk0Ng-- | Yahoo! Groups Terms of Use http://docs.yahoo.com/info/terms/ | Unsubscribe mailto:howldev-unsubscr...@yahoogroups.com?subject=unsubscribe __,_._,___
[jira] Commented: (HIVE-417) Implement Indexing in Hive
[ https://issues.apache.org/jira/browse/HIVE-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12893402#action_12893402 ] John Sichi commented on HIVE-417: - +1. Will commit when tests pass. I noticed a number of trivial issues (like Javadoc mismatches) which I'll put in a followup. Implement Indexing in Hive -- Key: HIVE-417 URL: https://issues.apache.org/jira/browse/HIVE-417 Project: Hadoop Hive Issue Type: New Feature Components: Metastore, Query Processor Affects Versions: 0.3.0, 0.3.1, 0.4.0, 0.6.0 Reporter: Prasad Chakka Assignee: He Yongqiang Attachments: hive-417.proto.patch, hive-417-2009-07-18.patch, hive-indexing-8-thrift-metastore-remodel.patch, hive-indexing.3.patch, hive-indexing.5.thrift.patch, hive.indexing.11.patch, hive.indexing.12.patch, idx2.png, indexing_with_ql_rewrites_trunk_953221.patch Implement indexing on Hive so that lookup and range queries are efficient. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-417) Implement Indexing in Hive
[ https://issues.apache.org/jira/browse/HIVE-417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi updated HIVE-417: Fix Version/s: 0.7.0 Implement Indexing in Hive -- Key: HIVE-417 URL: https://issues.apache.org/jira/browse/HIVE-417 Project: Hadoop Hive Issue Type: New Feature Components: Metastore, Query Processor Affects Versions: 0.3.0, 0.3.1, 0.4.0, 0.6.0 Reporter: Prasad Chakka Assignee: He Yongqiang Fix For: 0.7.0 Attachments: hive-417.proto.patch, hive-417-2009-07-18.patch, hive-indexing-8-thrift-metastore-remodel.patch, hive-indexing.3.patch, hive-indexing.5.thrift.patch, hive.indexing.11.patch, hive.indexing.12.patch, idx2.png, indexing_with_ql_rewrites_trunk_953221.patch Implement indexing on Hive so that lookup and range queries are efficient. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1492) FileSinkOperator should remove duplicated files from the same task based on file sizes
FileSinkOperator should remove duplicated files from the same task based on file sizes -- Key: HIVE-1492 URL: https://issues.apache.org/jira/browse/HIVE-1492 Project: Hadoop Hive Issue Type: Bug Reporter: Ning Zhang FileSinkOperator.jobClose() calls Utilities.removeTempOrDuplicateFiles() to retain only one file for each task. A task could produce multiple files due to failed attempts or speculative runs. The largest file should be retained rather than the first file for each task. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-417) Implement Indexing in Hive
[ https://issues.apache.org/jira/browse/HIVE-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12893455#action_12893455 ] Joydeep Sen Sarma commented on HIVE-417: i am waiting for a commit on hive-1408. that's probably gonna collide. Implement Indexing in Hive -- Key: HIVE-417 URL: https://issues.apache.org/jira/browse/HIVE-417 Project: Hadoop Hive Issue Type: New Feature Components: Metastore, Query Processor Affects Versions: 0.3.0, 0.3.1, 0.4.0, 0.6.0 Reporter: Prasad Chakka Assignee: He Yongqiang Fix For: 0.7.0 Attachments: hive-417.proto.patch, hive-417-2009-07-18.patch, hive-indexing-8-thrift-metastore-remodel.patch, hive-indexing.3.patch, hive-indexing.5.thrift.patch, hive.indexing.11.patch, hive.indexing.12.patch, idx2.png, indexing_with_ql_rewrites_trunk_953221.patch Implement indexing on Hive so that lookup and range queries are efficient. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1492) FileSinkOperator should remove duplicated files from the same task based on file sizes
[ https://issues.apache.org/jira/browse/HIVE-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1492: - Attachment: HIVE-1492.patch FileSinkOperator should remove duplicated files from the same task based on file sizes -- Key: HIVE-1492 URL: https://issues.apache.org/jira/browse/HIVE-1492 Project: Hadoop Hive Issue Type: Bug Affects Versions: 0.7.0 Reporter: Ning Zhang Attachments: HIVE-1492.patch FileSinkOperator.jobClose() calls Utilities.removeTempOrDuplicateFiles() to retain only one file for each task. A task could produce multiple files due to failed attempts or speculative runs. The largest file should be retained rather than the first file for each task. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1492) FileSinkOperator should remove duplicated files from the same task based on file sizes
[ https://issues.apache.org/jira/browse/HIVE-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1492: - Status: Patch Available (was: Open) Affects Version/s: 0.7.0 FileSinkOperator should remove duplicated files from the same task based on file sizes -- Key: HIVE-1492 URL: https://issues.apache.org/jira/browse/HIVE-1492 Project: Hadoop Hive Issue Type: Bug Affects Versions: 0.7.0 Reporter: Ning Zhang Attachments: HIVE-1492.patch FileSinkOperator.jobClose() calls Utilities.removeTempOrDuplicateFiles() to retain only one file for each task. A task could produce multiple files due to failed attempts or speculative runs. The largest file should be retained rather than the first file for each task. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HIVE-1492) FileSinkOperator should remove duplicated files from the same task based on file sizes
[ https://issues.apache.org/jira/browse/HIVE-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang reassigned HIVE-1492: Assignee: Ning Zhang FileSinkOperator should remove duplicated files from the same task based on file sizes -- Key: HIVE-1492 URL: https://issues.apache.org/jira/browse/HIVE-1492 Project: Hadoop Hive Issue Type: Bug Affects Versions: 0.7.0 Reporter: Ning Zhang Assignee: Ning Zhang Attachments: HIVE-1492.patch FileSinkOperator.jobClose() calls Utilities.removeTempOrDuplicateFiles() to retain only one file for each task. A task could produce multiple files due to failed attempts or speculative runs. The largest file should be retained rather than the first file for each task. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1492) FileSinkOperator should remove duplicated files from the same task based on file sizes
[ https://issues.apache.org/jira/browse/HIVE-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12893460#action_12893460 ] He Yongqiang commented on HIVE-1492: +1, looks good. will commit after tests pass. FileSinkOperator should remove duplicated files from the same task based on file sizes -- Key: HIVE-1492 URL: https://issues.apache.org/jira/browse/HIVE-1492 Project: Hadoop Hive Issue Type: Bug Affects Versions: 0.7.0 Reporter: Ning Zhang Assignee: Ning Zhang Attachments: HIVE-1492.patch FileSinkOperator.jobClose() calls Utilities.removeTempOrDuplicateFiles() to retain only one file for each task. A task could produce multiple files due to failed attempts or speculative runs. The largest file should be retained rather than the first file for each task. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-417) Implement Indexing in Hive
[ https://issues.apache.org/jira/browse/HIVE-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12893461#action_12893461 ] John Sichi commented on HIVE-417: - Thanks Joydeep. Yeah, this one has tons of plan diffs due to the virtual columns. Implement Indexing in Hive -- Key: HIVE-417 URL: https://issues.apache.org/jira/browse/HIVE-417 Project: Hadoop Hive Issue Type: New Feature Components: Metastore, Query Processor Affects Versions: 0.3.0, 0.3.1, 0.4.0, 0.6.0 Reporter: Prasad Chakka Assignee: He Yongqiang Fix For: 0.7.0 Attachments: hive-417.proto.patch, hive-417-2009-07-18.patch, hive-indexing-8-thrift-metastore-remodel.patch, hive-indexing.3.patch, hive-indexing.5.thrift.patch, hive.indexing.11.patch, hive.indexing.12.patch, idx2.png, indexing_with_ql_rewrites_trunk_953221.patch Implement indexing on Hive so that lookup and range queries are efficient. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1408) add option to let hive automatically run in local mode based on tunable heuristics
[ https://issues.apache.org/jira/browse/HIVE-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12893462#action_12893462 ] Joydeep Sen Sarma commented on HIVE-1408: - Ning - anything else u need from me? i was hoping to get it in before hive-417. otherwise i am sure would have to regenerate/reconcile a ton of stuff add option to let hive automatically run in local mode based on tunable heuristics -- Key: HIVE-1408 URL: https://issues.apache.org/jira/browse/HIVE-1408 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Joydeep Sen Sarma Assignee: Joydeep Sen Sarma Attachments: 1408.1.patch, 1408.2.patch, 1408.2.q.out.patch, 1408.7.patch, hive-1408.6.patch as a followup to HIVE-543 - we should have a simple option (enabled by default) to let hive run in local mode if possible. two levels of options are desirable: 1. hive.exec.mode.local.auto=true/false // control whether local mode is automatically chosen 2. Options to control different heuristics, some naiive examples: hive.exec.mode.local.auto.input.size.max=1G // don't choose local mode if data 1G hive.exec.mode.local.auto.script.enable=true/false // choose if local mode is enabled for queries with user scripts this can be implemented as a pre/post execution hook. It makes sense to provide this as a standard hook in the hive codebase since it's likely to improve response time for many users (especially for test queries). the initial proposal is to choose this at a query level and not at per hive-task (ie. hadoop job) level. per job-level requires more changes to compilation (to not pre-commit to hdfs or local scratch directories at compile time). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HIVE-1408) add option to let hive automatically run in local mode based on tunable heuristics
[ https://issues.apache.org/jira/browse/HIVE-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang resolved HIVE-1408. -- Fix Version/s: 0.7.0 Resolution: Fixed Committed. Thanks Joydeep! add option to let hive automatically run in local mode based on tunable heuristics -- Key: HIVE-1408 URL: https://issues.apache.org/jira/browse/HIVE-1408 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Joydeep Sen Sarma Assignee: Joydeep Sen Sarma Fix For: 0.7.0 Attachments: 1408.1.patch, 1408.2.patch, 1408.2.q.out.patch, 1408.7.patch, hive-1408.6.patch as a followup to HIVE-543 - we should have a simple option (enabled by default) to let hive run in local mode if possible. two levels of options are desirable: 1. hive.exec.mode.local.auto=true/false // control whether local mode is automatically chosen 2. Options to control different heuristics, some naiive examples: hive.exec.mode.local.auto.input.size.max=1G // don't choose local mode if data 1G hive.exec.mode.local.auto.script.enable=true/false // choose if local mode is enabled for queries with user scripts this can be implemented as a pre/post execution hook. It makes sense to provide this as a standard hook in the hive codebase since it's likely to improve response time for many users (especially for test queries). the initial proposal is to choose this at a query level and not at per hive-task (ie. hadoop job) level. per job-level requires more changes to compilation (to not pre-commit to hdfs or local scratch directories at compile time). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-417) Implement Indexing in Hive
[ https://issues.apache.org/jira/browse/HIVE-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12893488#action_12893488 ] John Sichi commented on HIVE-417: - Yongqiang, I passed tests on Hadoop 0.20, but Ning has committed HIVE-1408, which conflicts, so you'll need to rebase against that and then I'll try again. Implement Indexing in Hive -- Key: HIVE-417 URL: https://issues.apache.org/jira/browse/HIVE-417 Project: Hadoop Hive Issue Type: New Feature Components: Metastore, Query Processor Affects Versions: 0.3.0, 0.3.1, 0.4.0, 0.6.0 Reporter: Prasad Chakka Assignee: He Yongqiang Fix For: 0.7.0 Attachments: hive-417.proto.patch, hive-417-2009-07-18.patch, hive-indexing-8-thrift-metastore-remodel.patch, hive-indexing.3.patch, hive-indexing.5.thrift.patch, hive.indexing.11.patch, hive.indexing.12.patch, idx2.png, indexing_with_ql_rewrites_trunk_953221.patch Implement indexing on Hive so that lookup and range queries are efficient. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.