date:20100728


 [ 
https://issues.apache.org/jira/browse/HIVE-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joydeep Sen Sarma updated HIVE-1408:


Attachment: 1408.7.patch

 add option to let hive automatically run in local mode based on tunable 
 heuristics
 --

 Key: HIVE-1408
 URL: https://issues.apache.org/jira/browse/HIVE-1408
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Joydeep Sen Sarma
Assignee: Joydeep Sen Sarma
 Attachments: 1408.1.patch, 1408.2.patch, 1408.2.q.out.patch, 
 1408.7.patch, hive-1408.6.patch


 as a followup to HIVE-543 - we should have a simple option (enabled by 
 default) to let hive run in local mode if possible.
 two levels of options are desirable:
 1. hive.exec.mode.local.auto=true/false // control whether local mode is 
 automatically chosen
 2. Options to control different heuristics, some naiive examples:
  hive.exec.mode.local.auto.input.size.max=1G // don't choose local mode 
 if data  1G
  hive.exec.mode.local.auto.script.enable=true/false // choose if local 
 mode is enabled for queries with user scripts
 this can be implemented as a pre/post execution hook. It makes sense to 
 provide this as a standard hook in the hive codebase since it's likely to 
 improve response time for many users (especially for test queries).
 the initial proposal is to choose this at a query level and not at per 
 hive-task (ie. hadoop job) level. per job-level requires more changes to 
 compilation (to not pre-commit to hdfs or local scratch directories at 
 compile time).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1408) add option to let hive automatically run in local mode based on tunable heuristics


[ 
https://issues.apache.org/jira/browse/HIVE-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12893088#action_12893088
 ] 

Joydeep Sen Sarma commented on HIVE-1408:
-

#1 - we decide that i would try to take out ProxyFileSystem from the hive jars 
in the distribution. unfortunately, i am unable to do so - all the simple ways 
seem to break the tests. i don't see much of a downside with the current 
arrangement - ProxyFileSystem is test-only code - there's no reason why anyone 
should invoke this. so shouldn't cause any problems (even though it ships with 
the hive jars). the pfile:// - ProxyFileSystem mapping exists only in test 
mode.

  btw - i can't use ShimLoader - because Hadoop doesn't specify a factory class 
for creating file system object. it expects a file system class directly. that 
makes it impossible to write a portable filesystem class using the shimloader 
paradigm. i am beginning to appreciate factory classes more.

#2 not an issue - can't use ShimLoader as per above.

#3 fixed

#4, #5, #6, #7, #8 - not an issue as we discussed. HIVE-1484 has already been 
filed as a followup work to use local dir for intermediate data when possible

#9 - fixed. moved one public func to Utility.java and eliminated the other.


 add option to let hive automatically run in local mode based on tunable 
 heuristics
 --

 Key: HIVE-1408
 URL: https://issues.apache.org/jira/browse/HIVE-1408
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Joydeep Sen Sarma
Assignee: Joydeep Sen Sarma
 Attachments: 1408.1.patch, 1408.2.patch, 1408.2.q.out.patch, 
 1408.7.patch, hive-1408.6.patch


 as a followup to HIVE-543 - we should have a simple option (enabled by 
 default) to let hive run in local mode if possible.
 two levels of options are desirable:
 1. hive.exec.mode.local.auto=true/false // control whether local mode is 
 automatically chosen
 2. Options to control different heuristics, some naiive examples:
  hive.exec.mode.local.auto.input.size.max=1G // don't choose local mode 
 if data  1G
  hive.exec.mode.local.auto.script.enable=true/false // choose if local 
 mode is enabled for queries with user scripts
 this can be implemented as a pre/post execution hook. It makes sense to 
 provide this as a standard hook in the hive codebase since it's likely to 
 improve response time for many users (especially for test queries).
 the initial proposal is to choose this at a query level and not at per 
 hive-task (ie. hadoop job) level. per job-level requires more changes to 
 compilation (to not pre-commit to hdfs or local scratch directories at 
 compile time).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1126) Missing some Jdbc functionality like getTables getColumns and HiveResultSet.get* methods based on column name.

2010-07-28 Thread Bennie Schut (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12893170#action_12893170
 ] 

Bennie Schut commented on HIVE-1126:


I keep getting errors on my test run on the test: testCliDriver_loadpart_err 
which seem unrelated to my changes.

 Missing some Jdbc functionality like getTables getColumns and 
 HiveResultSet.get* methods based on column name.
 --

 Key: HIVE-1126
 URL: https://issues.apache.org/jira/browse/HIVE-1126
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Clients
Reporter: Bennie Schut
Assignee: Bennie Schut
 Fix For: 0.7.0

 Attachments: HIVE-1126-1.patch, HIVE-1126-2.patch, HIVE-1126-3.patch, 
 HIVE-1126-4.patch, HIVE-1126-5.patch, HIVE-1126-6.patch, HIVE-1126.patch, 
 HIVE-1126_patch(0.5.0_source).patch


 I've been using the hive jdbc driver more and more and was missing some 
 functionality which I added
 HiveDatabaseMetaData.getTables
 Using show tables to get the info from hive.
 HiveDatabaseMetaData.getColumns
 Using describe tablename to get the columns.
 This makes using something like SQuirreL a lot nicer since you have the list 
 of tables and just click on the content tab to see what's in the table.
 I also implemented
 HiveResultSet.getObject(String columnName) so you call most get* methods 
 based on the column name.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1126) Missing some Jdbc functionality like getTables getColumns and HiveResultSet.get* methods based on column name.

2010-07-28 Thread Amr Awadallah (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12893175#action_12893175
]

Amr Awadallah commented on HIVE-1126:
-

I am out of office on vacation and will be slower than usual in
responding to emails. If this is urgent then please call my cell phone
(or send an sms), otherwise I will reply to your email when I get
back.

Thanks for your patience,

-- amr

Missing some Jdbc functionality like getTables getColumns and
HiveResultSet.get* methods based on column name.
--

Key: HIVE-1126
URL: https://issues.apache.org/jira/browse/HIVE-1126
Project: Hadoop Hive
Issue Type: Improvement
Components: Clients
Reporter: Bennie Schut
Assignee: Bennie Schut
Fix For: 0.7.0

Attachments: HIVE-1126-1.patch, HIVE-1126-2.patch, HIVE-1126-3.patch,
HIVE-1126-4.patch, HIVE-1126-5.patch, HIVE-1126-6.patch, HIVE-1126.patch,
HIVE-1126_patch(0.5.0_source).patch

I've been using the hive jdbc driver more and more and was missing some
functionality which I added
HiveDatabaseMetaData.getTables
Using show tables to get the info from hive.
HiveDatabaseMetaData.getColumns
Using describe tablename to get the columns.
This makes using something like SQuirreL a lot nicer since you have the list
of tables and just click on the content tab to see what's in the table.
I also implemented
HiveResultSet.getObject(String columnName) so you call most get* methods
based on the column name.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Build failed in Hudson: Hive-trunk-h0.17 #505

2010-07-28 Thread Apache Hudson Server

See http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/505/changes

Changes:

[nzhang] HIVE-1425. hive.task.progress should be added to conf/hive-default.xml 
(John Sichi via Ning Zhang)

--
[...truncated 9371 lines...]
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table src
[junit] POSTHOOK: Output: defa...@src
[junit] OK
[junit] Loading data to table src1
[junit] POSTHOOK: Output: defa...@src1
[junit] OK
[junit] Loading data to table src_sequencefile
[junit] POSTHOOK: Output: defa...@src_sequencefile
[junit] OK
[junit] Loading data to table src_thrift
[junit] POSTHOOK: Output: defa...@src_thrift
[junit] OK
[junit] Loading data to table src_json
[junit] diff 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/build/ql/test/logs/negative/unknown_function4.q.out
 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/ql/src/test/results/compiler/errors/unknown_function4.q.out
[junit] Done query: unknown_function4.q
[junit] POSTHOOK: Output: defa...@src_json
[junit] OK
[junit] Begin query: unknown_table1.q
[junit] Loading data to table srcpart partition (ds=2008-04-08, hr=11)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=11
[junit] OK
[junit] Loading data to table srcpart partition (ds=2008-04-08, hr=12)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=12
[junit] OK
[junit] Loading data to table srcpart partition (ds=2008-04-09, hr=11)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=11
[junit] OK
[junit] Loading data to table srcpart partition (ds=2008-04-09, hr=12)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=12
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table src
[junit] POSTHOOK: Output: defa...@src
[junit] OK
[junit] Loading data to table src1
[junit] POSTHOOK: Output: defa...@src1
[junit] OK
[junit] Loading data to table src_sequencefile
[junit] POSTHOOK: Output: defa...@src_sequencefile
[junit] OK
[junit] Loading data to table src_thrift
[junit] POSTHOOK: Output: defa...@src_thrift
[junit] OK
[junit] Loading data to table src_json
[junit] POSTHOOK: Output: defa...@src_json
[junit] OK
[junit] diff 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/build/ql/test/logs/negative/unknown_table1.q.out
 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/ql/src/test/results/compiler/errors/unknown_table1.q.out
[junit] Done query: unknown_table1.q
[junit] Begin query: unknown_table2.q
[junit] Loading data to table srcpart partition (ds=2008-04-08, hr=11)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=11
[junit] OK
[junit] Loading data to table srcpart partition (ds=2008-04-08, hr=12)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=12
[junit] OK
[junit] Loading data to table srcpart partition (ds=2008-04-09, hr=11)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=11
[junit] OK
[junit] Loading data to table srcpart partition (ds=2008-04-09, hr=12)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=12
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK

Build failed in Hudson: Hive-trunk-h0.19 #507

2010-07-28 Thread Apache Hudson Server

See http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.19/507/changes

Changes:

[nzhang] HIVE-1425. hive.task.progress should be added to conf/hive-default.xml 
(John Sichi via Ning Zhang)

--
[...truncated 12080 lines...]
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table src
[junit] POSTHOOK: Output: defa...@src
[junit] OK
[junit] Loading data to table src1
[junit] POSTHOOK: Output: defa...@src1
[junit] OK
[junit] Loading data to table src_sequencefile
[junit] POSTHOOK: Output: defa...@src_sequencefile
[junit] OK
[junit] Loading data to table src_thrift
[junit] POSTHOOK: Output: defa...@src_thrift
[junit] OK
[junit] Loading data to table src_json
[junit] POSTHOOK: Output: defa...@src_json
[junit] OK
[junit] diff 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.19/ws/hive/build/ql/test/logs/negative/unknown_function4.q.out
 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.19/ws/hive/ql/src/test/results/compiler/errors/unknown_function4.q.out
[junit] Done query: unknown_function4.q
[junit] Begin query: unknown_table1.q
[junit] Loading data to table srcpart partition (ds=2008-04-08, hr=11)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=11
[junit] OK
[junit] Loading data to table srcpart partition (ds=2008-04-08, hr=12)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=12
[junit] OK
[junit] Loading data to table srcpart partition (ds=2008-04-09, hr=11)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=11
[junit] OK
[junit] Loading data to table srcpart partition (ds=2008-04-09, hr=12)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=12
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table src
[junit] POSTHOOK: Output: defa...@src
[junit] OK
[junit] Loading data to table src1
[junit] POSTHOOK: Output: defa...@src1
[junit] OK
[junit] Loading data to table src_sequencefile
[junit] POSTHOOK: Output: defa...@src_sequencefile
[junit] OK
[junit] Loading data to table src_thrift
[junit] POSTHOOK: Output: defa...@src_thrift
[junit] OK
[junit] Loading data to table src_json
[junit] POSTHOOK: Output: defa...@src_json
[junit] OK
[junit] diff 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.19/ws/hive/build/ql/test/logs/negative/unknown_table1.q.out
 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.19/ws/hive/ql/src/test/results/compiler/errors/unknown_table1.q.out
[junit] Done query: unknown_table1.q
[junit] Begin query: unknown_table2.q
[junit] Loading data to table srcpart partition (ds=2008-04-08, hr=11)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=11
[junit] OK
[junit] Loading data to table srcpart partition (ds=2008-04-08, hr=12)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=12
[junit] OK
[junit] Loading data to table srcpart partition (ds=2008-04-09, hr=11)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=11
[junit] OK
[junit] Loading data to table srcpart partition (ds=2008-04-09, hr=12)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=12
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2

[jira] Commented: (HIVE-1126) Missing some Jdbc functionality like getTables getColumns and HiveResultSet.get* methods based on column name.

[
https://issues.apache.org/jira/browse/HIVE-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12893275#action_12893275
]

John Sichi commented on HIVE-1126:
--

@Bennie: yeah, it is flaking for me too. It has been flaky forever but seems
to have gotten worse for me recently. I've logged HIVE-1491 to disable it, but
until we get that done, you can just delete loadpart_err.q and
loadpart_err.q.out before running ant package test.

Missing some Jdbc functionality like getTables getColumns and
HiveResultSet.get* methods based on column name.
--

Key: HIVE-1126
URL: https://issues.apache.org/jira/browse/HIVE-1126
Project: Hadoop Hive
Issue Type: Improvement
Components: Clients
Reporter: Bennie Schut
Assignee: Bennie Schut
Fix For: 0.7.0

Attachments: HIVE-1126-1.patch, HIVE-1126-2.patch, HIVE-1126-3.patch,
HIVE-1126-4.patch, HIVE-1126-5.patch, HIVE-1126-6.patch, HIVE-1126.patch,
HIVE-1126_patch(0.5.0_source).patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1408) add option to let hive automatically run in local mode based on tunable heuristics

[
https://issues.apache.org/jira/browse/HIVE-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12893291#action_12893291
]

Ning Zhang commented on HIVE-1408:
--

Looks good in general. One minor thing though: I tried it on real clusters and
it works great except that I need to manually set mapred.local.dir even though
hive.exec.mode.local.auto is already set to true. Should we treat
mapred.local.dir the same as HADOOPJT so that it can be set automatically when
local mode is on and reset it back in Driver and Context?

add option to let hive automatically run in local mode based on tunable
heuristics
--

Key: HIVE-1408
URL: https://issues.apache.org/jira/browse/HIVE-1408
Project: Hadoop Hive
Issue Type: New Feature
Components: Query Processor
Reporter: Joydeep Sen Sarma
Assignee: Joydeep Sen Sarma
Attachments: 1408.1.patch, 1408.2.patch, 1408.2.q.out.patch,
1408.7.patch, hive-1408.6.patch

as a followup to HIVE-543 - we should have a simple option (enabled by
default) to let hive run in local mode if possible.
two levels of options are desirable:
1. hive.exec.mode.local.auto=true/false // control whether local mode is
automatically chosen
2. Options to control different heuristics, some naiive examples:
hive.exec.mode.local.auto.input.size.max=1G // don't choose local mode
if data 1G
hive.exec.mode.local.auto.script.enable=true/false // choose if local
mode is enabled for queries with user scripts
this can be implemented as a pre/post execution hook. It makes sense to
provide this as a standard hook in the hive codebase since it's likely to
improve response time for many users (especially for test queries).
the initial proposal is to choose this at a query level and not at per
hive-task (ie. hadoop job) level. per job-level requires more changes to
compilation (to not pre-commit to hdfs or local scratch directories at
compile time).

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1126) Missing some Jdbc functionality like getTables getColumns and HiveResultSet.get* methods based on column name.

2010-07-28 Thread Bennie Schut (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Bennie Schut updated HIVE-1126:
---

Attachment: HIVE-1126-7.patch

New patch with fixed test. Also switched the actual/expected values so they are
now correct plus added some messages which should make any failing test more
clear.

Missing some Jdbc functionality like getTables getColumns and
HiveResultSet.get* methods based on column name.
--

Key: HIVE-1126
URL: https://issues.apache.org/jira/browse/HIVE-1126
Project: Hadoop Hive
Issue Type: Improvement
Components: Clients
Reporter: Bennie Schut
Assignee: Bennie Schut
Fix For: 0.7.0

Attachments: HIVE-1126-1.patch, HIVE-1126-2.patch, HIVE-1126-3.patch,
HIVE-1126-4.patch, HIVE-1126-5.patch, HIVE-1126-6.patch, HIVE-1126-7.patch,
HIVE-1126.patch, HIVE-1126_patch(0.5.0_source).patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1126) Missing some Jdbc functionality like getTables getColumns and HiveResultSet.get* methods based on column name.

2010-07-28 Thread Bennie Schut (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bennie Schut updated HIVE-1126:
---

Status: Patch Available  (was: Open)

 Missing some Jdbc functionality like getTables getColumns and 
 HiveResultSet.get* methods based on column name.
 --

 Key: HIVE-1126
 URL: https://issues.apache.org/jira/browse/HIVE-1126
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Clients
Reporter: Bennie Schut
Assignee: Bennie Schut
 Fix For: 0.7.0

 Attachments: HIVE-1126-1.patch, HIVE-1126-2.patch, HIVE-1126-3.patch, 
 HIVE-1126-4.patch, HIVE-1126-5.patch, HIVE-1126-6.patch, HIVE-1126-7.patch, 
 HIVE-1126.patch, HIVE-1126_patch(0.5.0_source).patch


 I've been using the hive jdbc driver more and more and was missing some 
 functionality which I added
 HiveDatabaseMetaData.getTables
 Using show tables to get the info from hive.
 HiveDatabaseMetaData.getColumns
 Using describe tablename to get the columns.
 This makes using something like SQuirreL a lot nicer since you have the list 
 of tables and just click on the content tab to see what's in the table.
 I also implemented
 HiveResultSet.getObject(String columnName) so you call most get* methods 
 based on the column name.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1408) add option to let hive automatically run in local mode based on tunable heuristics

[
https://issues.apache.org/jira/browse/HIVE-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12893314#action_12893314
]

Joydeep Sen Sarma commented on HIVE-1408:
-

yeah - so the solution is that the mapred.local.dir needs to be set correctly
in hive/hadoop client side xml. for our internal install - i will send a diff
changing the client side to point to /tmp (instead of having server side
config).

there's nothing to do on the hive open source version. mapred.local.dir is a
client only variable and needs to be set specific to the client side by the
admin. basically our internal client side config has a bug :-)

add option to let hive automatically run in local mode based on tunable
heuristics
--

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-417) Implement Indexing in Hive


[ 
https://issues.apache.org/jira/browse/HIVE-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1289#action_1289
 ] 

John Sichi commented on HIVE-417:
-

Thanks Yongqiang.  Looking at it now.

 Implement Indexing in Hive
 --

 Key: HIVE-417
 URL: https://issues.apache.org/jira/browse/HIVE-417
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Metastore, Query Processor
Affects Versions: 0.3.0, 0.3.1, 0.4.0, 0.6.0
Reporter: Prasad Chakka
Assignee: He Yongqiang
 Attachments: hive-417.proto.patch, hive-417－2009-07-18.patch, 
 hive-indexing-8-thrift-metastore-remodel.patch, hive-indexing.3.patch, 
 hive-indexing.5.thrift.patch, hive.indexing.11.patch, hive.indexing.12.patch, 
 idx2.png, indexing_with_ql_rewrites_trunk_953221.patch


 Implement indexing on Hive so that lookup and range queries are efficient.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1434) Cassandra Storage Handler

2010-07-28 Thread Edward Capriolo (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1434:
--

Attachment: cas-handle.tar.gz

This is not a quality patch yet. I am still experimenting with some ideas. 
Everying is free form and will likely change before the final patch. There are 
a few junk files (HiveIColumn,etc) which will not be part of the release.
Thus far:
CassandraSplit.java
HiveCassandraTableInputFormat.java
CassandraSerDe.java
TestColumnFamilyInputFormat.java
TestCassandraPut.java
TestColumnFamilyInputFormat.java

Are working and can give you an idea of where the code is going.

 Cassandra Storage Handler
 -

 Key: HIVE-1434
 URL: https://issues.apache.org/jira/browse/HIVE-1434
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Attachments: cas-handle.tar.gz, hive-1434-1.txt


 Add a cassandra storage handler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Fwd: [howldev] Initial thoughts on authorization in howl

2010-07-28 Thread John Sichi

Begin forwarded message:

From: Pradeep Kamath prade...@yahoo-inc.commailto:prade...@yahoo-inc.com
Date: July 27, 2010 4:38:42 PM PDT
To: howl...@yahoogroups.commailto:howl...@yahoogroups.com
Subject: [howldev] Initial thoughts on authorization in howl
Reply-To: howl...@yahoogroups.commailto:howl...@yahoogroups.com

The initial thoughts on authorization in howl are to model authorization (for 
DDL ops like create table/drop table/add partition etc) after hdfs permissions. 
To be able to do this, we would like to extend createTable() to add the ability 
to record a different group from the user’s primary group and to record the 
complete unix permissions on the table directory. Also, we would like to have a 
way for partition directories to inherit permissions and group information 
based on the table directory. To keep the metastore backward compatible for use 
with hive, I propose having conf variables to achieve these objectives:
-  table.group.namehttp://table.group.name – value will indicate the 
name of the unix group for the table directory. This will be used by 
createTable() to perform a chgrp to the value provided. This property will 
provide the user the ability to choose from one of the many unix groups he is 
part of to associate with the table.
-  table.permissions – value will be of the form rwxrwxrwx to indicate 
read-write-execute permissions on the table directory. This will be used by 
createTable() to perform a chmod to the value provided. This will let the user 
decide what permissions he wants on the table.
-  partitions.inherit.permissions – a value of true will indicate that 
partitions inherit the group name and permissions of the table level directory. 
This will be used by addPartition() to perform a chgrp and chmod to the values 
as on the table directory.

I favor conf properties over API changes since the complete authorization 
design for hive is not finalized yet. These properties can be 
deprecated/removed when that is in place. These properties would also be useful 
to some installation of vanilla hive since at least DFS level authorization can 
now be achieved by hive without the user having to manually perform chgrp and 
chmod operations on DFS.

I would like to hear from hive developers/committers whether this would be 
acceptable for hive and also thoughts from others.

Pradeep

__._,_.___

Your email settings: Individual Email|Traditional
Change settings via the 
Webhttp://groups.yahoo.com/group/howldev/join;_ylc=X3oDMTJnZXE5ZHNwBF9TAzk3NDc2NTkwBGdycElkAzYzNDIwNTA4BGdycHNwSWQDMTcwNzI4MTk0MgRzZWMDZnRyBHNsawNzdG5ncwRzdGltZQMxMjgwMjczOTQ2
 (Yahoo! ID required)
Change settings via email: Switch delivery to Daily 
Digestmailto:howldev-dig...@yahoogroups.com?subject=email%20delivery:%20Digest
 | Switch to Fully 
Featuredmailto:howldev-fullfeatu...@yahoogroups.com?subject=change%20delivery%20format:%20Fully%20Featured
Visit Your Group 
http://groups.yahoo.com/group/howldev;_ylc=X3oDMTJlOWw0Y3F0BF9TAzk3NDc2NTkwBGdycElkAzYzNDIwNTA4BGdycHNwSWQDMTcwNzI4MTk0MgRzZWMDZnRyBHNsawNocGYEc3RpbWUDMTI4MDI3Mzk0Ng--
 | Yahoo! Groups Terms of Use http://docs.yahoo.com/info/terms/ | Unsubscribe 
mailto:howldev-unsubscr...@yahoogroups.com?subject=unsubscribe

__,_._,___

[jira] Commented: (HIVE-417) Implement Indexing in Hive


[ 
https://issues.apache.org/jira/browse/HIVE-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12893402#action_12893402
 ] 

John Sichi commented on HIVE-417:
-

+1.  Will commit when tests pass.  I noticed a number of trivial issues (like 
Javadoc mismatches) which I'll put in a followup.


 Implement Indexing in Hive
 --

 Key: HIVE-417
 URL: https://issues.apache.org/jira/browse/HIVE-417
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Metastore, Query Processor
Affects Versions: 0.3.0, 0.3.1, 0.4.0, 0.6.0
Reporter: Prasad Chakka
Assignee: He Yongqiang
 Attachments: hive-417.proto.patch, hive-417－2009-07-18.patch, 
 hive-indexing-8-thrift-metastore-remodel.patch, hive-indexing.3.patch, 
 hive-indexing.5.thrift.patch, hive.indexing.11.patch, hive.indexing.12.patch, 
 idx2.png, indexing_with_ql_rewrites_trunk_953221.patch


 Implement indexing on Hive so that lookup and range queries are efficient.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-417) Implement Indexing in Hive


 [ 
https://issues.apache.org/jira/browse/HIVE-417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-417:


Fix Version/s: 0.7.0

 Implement Indexing in Hive
 --

 Key: HIVE-417
 URL: https://issues.apache.org/jira/browse/HIVE-417
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Metastore, Query Processor
Affects Versions: 0.3.0, 0.3.1, 0.4.0, 0.6.0
Reporter: Prasad Chakka
Assignee: He Yongqiang
 Fix For: 0.7.0

 Attachments: hive-417.proto.patch, hive-417－2009-07-18.patch, 
 hive-indexing-8-thrift-metastore-remodel.patch, hive-indexing.3.patch, 
 hive-indexing.5.thrift.patch, hive.indexing.11.patch, hive.indexing.12.patch, 
 idx2.png, indexing_with_ql_rewrites_trunk_953221.patch


 Implement indexing on Hive so that lookup and range queries are efficient.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HIVE-1492) FileSinkOperator should remove duplicated files from the same task based on file sizes

FileSinkOperator should remove duplicated files from the same task based on 
file sizes
--

 Key: HIVE-1492
 URL: https://issues.apache.org/jira/browse/HIVE-1492
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Ning Zhang


FileSinkOperator.jobClose() calls Utilities.removeTempOrDuplicateFiles() to 
retain only one file for each task. A task could produce multiple files due to 
failed attempts or speculative runs. The largest file should be retained rather 
than the first file for each task. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-417) Implement Indexing in Hive


[ 
https://issues.apache.org/jira/browse/HIVE-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12893455#action_12893455
 ] 

Joydeep Sen Sarma commented on HIVE-417:


i am waiting for a commit on hive-1408. that's probably gonna collide.

 Implement Indexing in Hive
 --

 Key: HIVE-417
 URL: https://issues.apache.org/jira/browse/HIVE-417
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Metastore, Query Processor
Affects Versions: 0.3.0, 0.3.1, 0.4.0, 0.6.0
Reporter: Prasad Chakka
Assignee: He Yongqiang
 Fix For: 0.7.0

 Attachments: hive-417.proto.patch, hive-417－2009-07-18.patch, 
 hive-indexing-8-thrift-metastore-remodel.patch, hive-indexing.3.patch, 
 hive-indexing.5.thrift.patch, hive.indexing.11.patch, hive.indexing.12.patch, 
 idx2.png, indexing_with_ql_rewrites_trunk_953221.patch


 Implement indexing on Hive so that lookup and range queries are efficient.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1492) FileSinkOperator should remove duplicated files from the same task based on file sizes


 [ 
https://issues.apache.org/jira/browse/HIVE-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1492:
-

Attachment: HIVE-1492.patch

 FileSinkOperator should remove duplicated files from the same task based on 
 file sizes
 --

 Key: HIVE-1492
 URL: https://issues.apache.org/jira/browse/HIVE-1492
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Ning Zhang
 Attachments: HIVE-1492.patch


 FileSinkOperator.jobClose() calls Utilities.removeTempOrDuplicateFiles() to 
 retain only one file for each task. A task could produce multiple files due 
 to failed attempts or speculative runs. The largest file should be retained 
 rather than the first file for each task. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1492) FileSinkOperator should remove duplicated files from the same task based on file sizes


 [ 
https://issues.apache.org/jira/browse/HIVE-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1492:
-

   Status: Patch Available  (was: Open)
Affects Version/s: 0.7.0

 FileSinkOperator should remove duplicated files from the same task based on 
 file sizes
 --

 Key: HIVE-1492
 URL: https://issues.apache.org/jira/browse/HIVE-1492
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Ning Zhang
 Attachments: HIVE-1492.patch


 FileSinkOperator.jobClose() calls Utilities.removeTempOrDuplicateFiles() to 
 retain only one file for each task. A task could produce multiple files due 
 to failed attempts or speculative runs. The largest file should be retained 
 rather than the first file for each task. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HIVE-1492) FileSinkOperator should remove duplicated files from the same task based on file sizes


 [ 
https://issues.apache.org/jira/browse/HIVE-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang reassigned HIVE-1492:


Assignee: Ning Zhang

 FileSinkOperator should remove duplicated files from the same task based on 
 file sizes
 --

 Key: HIVE-1492
 URL: https://issues.apache.org/jira/browse/HIVE-1492
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-1492.patch


 FileSinkOperator.jobClose() calls Utilities.removeTempOrDuplicateFiles() to 
 retain only one file for each task. A task could produce multiple files due 
 to failed attempts or speculative runs. The largest file should be retained 
 rather than the first file for each task. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1492) FileSinkOperator should remove duplicated files from the same task based on file sizes

2010-07-28 Thread He Yongqiang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12893460#action_12893460
 ] 

He Yongqiang commented on HIVE-1492:


+1, looks good. will commit after tests pass.

 FileSinkOperator should remove duplicated files from the same task based on 
 file sizes
 --

 Key: HIVE-1492
 URL: https://issues.apache.org/jira/browse/HIVE-1492
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-1492.patch


 FileSinkOperator.jobClose() calls Utilities.removeTempOrDuplicateFiles() to 
 retain only one file for each task. A task could produce multiple files due 
 to failed attempts or speculative runs. The largest file should be retained 
 rather than the first file for each task. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-417) Implement Indexing in Hive


[ 
https://issues.apache.org/jira/browse/HIVE-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12893461#action_12893461
 ] 

John Sichi commented on HIVE-417:
-

Thanks Joydeep.  Yeah, this one has tons of plan diffs due to the virtual 
columns.

 Implement Indexing in Hive
 --

 Key: HIVE-417
 URL: https://issues.apache.org/jira/browse/HIVE-417
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Metastore, Query Processor
Affects Versions: 0.3.0, 0.3.1, 0.4.0, 0.6.0
Reporter: Prasad Chakka
Assignee: He Yongqiang
 Fix For: 0.7.0

 Attachments: hive-417.proto.patch, hive-417－2009-07-18.patch, 
 hive-indexing-8-thrift-metastore-remodel.patch, hive-indexing.3.patch, 
 hive-indexing.5.thrift.patch, hive.indexing.11.patch, hive.indexing.12.patch, 
 idx2.png, indexing_with_ql_rewrites_trunk_953221.patch


 Implement indexing on Hive so that lookup and range queries are efficient.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1408) add option to let hive automatically run in local mode based on tunable heuristics


[ 
https://issues.apache.org/jira/browse/HIVE-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12893462#action_12893462
 ] 

Joydeep Sen Sarma commented on HIVE-1408:
-

Ning - anything else u need from me? i was hoping to get it in before hive-417. 
otherwise i am sure would have to regenerate/reconcile a ton of stuff

 add option to let hive automatically run in local mode based on tunable 
 heuristics
 --

 Key: HIVE-1408
 URL: https://issues.apache.org/jira/browse/HIVE-1408
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Joydeep Sen Sarma
Assignee: Joydeep Sen Sarma
 Attachments: 1408.1.patch, 1408.2.patch, 1408.2.q.out.patch, 
 1408.7.patch, hive-1408.6.patch


 as a followup to HIVE-543 - we should have a simple option (enabled by 
 default) to let hive run in local mode if possible.
 two levels of options are desirable:
 1. hive.exec.mode.local.auto=true/false // control whether local mode is 
 automatically chosen
 2. Options to control different heuristics, some naiive examples:
  hive.exec.mode.local.auto.input.size.max=1G // don't choose local mode 
 if data  1G
  hive.exec.mode.local.auto.script.enable=true/false // choose if local 
 mode is enabled for queries with user scripts
 this can be implemented as a pre/post execution hook. It makes sense to 
 provide this as a standard hook in the hive codebase since it's likely to 
 improve response time for many users (especially for test queries).
 the initial proposal is to choose this at a query level and not at per 
 hive-task (ie. hadoop job) level. per job-level requires more changes to 
 compilation (to not pre-commit to hdfs or local scratch directories at 
 compile time).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HIVE-1408) add option to let hive automatically run in local mode based on tunable heuristics


 [ 
https://issues.apache.org/jira/browse/HIVE-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang resolved HIVE-1408.
--

Fix Version/s: 0.7.0
   Resolution: Fixed

Committed. Thanks Joydeep!

 add option to let hive automatically run in local mode based on tunable 
 heuristics
 --

 Key: HIVE-1408
 URL: https://issues.apache.org/jira/browse/HIVE-1408
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Joydeep Sen Sarma
Assignee: Joydeep Sen Sarma
 Fix For: 0.7.0

 Attachments: 1408.1.patch, 1408.2.patch, 1408.2.q.out.patch, 
 1408.7.patch, hive-1408.6.patch


 as a followup to HIVE-543 - we should have a simple option (enabled by 
 default) to let hive run in local mode if possible.
 two levels of options are desirable:
 1. hive.exec.mode.local.auto=true/false // control whether local mode is 
 automatically chosen
 2. Options to control different heuristics, some naiive examples:
  hive.exec.mode.local.auto.input.size.max=1G // don't choose local mode 
 if data  1G
  hive.exec.mode.local.auto.script.enable=true/false // choose if local 
 mode is enabled for queries with user scripts
 this can be implemented as a pre/post execution hook. It makes sense to 
 provide this as a standard hook in the hive codebase since it's likely to 
 improve response time for many users (especially for test queries).
 the initial proposal is to choose this at a query level and not at per 
 hive-task (ie. hadoop job) level. per job-level requires more changes to 
 compilation (to not pre-commit to hdfs or local scratch directories at 
 compile time).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-417) Implement Indexing in Hive