[jira] Commented: (HIVE-1508) Add cleanup method to HiveHistory class

2011-01-04 Thread Mohit Sikri (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12977185#action_12977185
 ] 

Mohit Sikri commented on HIVE-1508:
---

Great finding, but how do we curtail(or put the limit to) the number of these 
files generated, since I don't see any mechanism of deleting such files (say 
deleting the files older than an hour or so), they may eventually pile up to 
the extent of consuming significant disk space.
I don't see any significance of these files once session expires or keeping too 
old session information within such files.
Kindly suggest.

 Add cleanup method to HiveHistory class
 ---

 Key: HIVE-1508
 URL: https://issues.apache.org/jira/browse/HIVE-1508
 Project: Hive
  Issue Type: Bug
  Components: Metastore, Server Infrastructure
Reporter: Anurag Phadke
Assignee: Edward Capriolo
Priority: Blocker
 Fix For: 0.7.0

 Attachments: hive-1508-1-patch.txt


 Running hive server for long time  90 minutes results in too many open 
 file-handles, eventually causing the server to crash as the server runs out 
 of file handle.
 Actual bug as described by Carl Steinbach:
 the hive_job_log_* files are created by the HiveHistory class. This class 
 creates a PrintWriter for writing to the file, but never closes the writer. 
 It looks like we need to add a cleanup method to HiveHistory that closes the 
 PrintWriter and does any other necessary cleanup. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Regarding Hive History File(s).

2011-01-04 Thread Mohit
Hello All,

 

What is the purpose of maintaining hive history files which contain session
information like session start, query start, query end, task start, task end
etc.? Are they being used later (say by a tool) for some purpose?

 

I don't see these files being getting deleted from the system ;any
configuration needed to be set  to enable deletion or Is there any design
strategy/decision/rationale for not deleting them at all?

 

Also, in these files I don't see the session end message being logged, is it
reserved for future use?

 

-Mohit

 


***
This e-mail and attachments contain confidential information from HUAWEI,
which is intended only for the person or entity whose address is listed
above. Any use of the information contained herein in any way (including,
but not limited to, total or partial disclosure, reproduction, or
dissemination) by persons other than the intended recipient's) is
prohibited. If you receive this e-mail in error, please notify the sender by
phone or email immediately and delete it!

 



Re: Regarding Hive History File(s).

2011-01-04 Thread Edward Capriolo
On Tue, Jan 4, 2011 at 7:03 AM, Mohit mohitsi...@huawei.com wrote:
 Hello All,



 What is the purpose of maintaining hive history files which contain session
 information like session start, query start, query end, task start, task end
 etc.? Are they being used later (say by a tool) for some purpose?



 I don't see these files being getting deleted from the system ;any
 configuration needed to be set  to enable deletion or Is there any design
 strategy/decision/rationale for not deleting them at all?



 Also, in these files I don't see the session end message being logged, is it
 reserved for future use?



 -Mohit



 ***
 This e-mail and attachments contain confidential information from HUAWEI,
 which is intended only for the person or entity whose address is listed
 above. Any use of the information contained herein in any way (including,
 but not limited to, total or partial disclosure, reproduction, or
 dissemination) by persons other than the intended recipient's) is
 prohibited. If you receive this e-mail in error, please notify the sender by
 phone or email immediately and delete it!



HiveHistory was added a while ago between 3.0 and 4.0 (iirc). A tool
to view them is HiveHistoryViewer in the API. I am not exactly sure
who is doing what with that data. The Web Interface does use it to
provide links to the JobTracker. So it helpful for trying to trace all
the dependant jobs of a query after the fact.

There is a ticket open to customize the file location. I was also
thinking we should allow the user to supply a 'none' to turn off the
feature. As for clean up and management cron and rm seem like a good
fit.


[jira] Updated: (HIVE-1818) Call frequency and duration metrics for HiveMetaStore via jmx

2011-01-04 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-1818:
---

Fix Version/s: 0.7.0
   Status: Patch Available  (was: Open)

 Call frequency and duration metrics for HiveMetaStore via jmx
 -

 Key: HIVE-1818
 URL: https://issues.apache.org/jira/browse/HIVE-1818
 Project: Hive
  Issue Type: New Feature
  Components: Metastore
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
Priority: Minor
 Fix For: 0.7.0

 Attachments: HIVE-1818-vs-1054860.patch, HIVE-1818.patch


 As recently brought up in the hive-dev mailing list, it'd be useful if the 
 HiveMetaStore had some sort of instrumentation capability so as to measure 
 frequency of calls to various calls on the HiveMetaStore and the duration of 
 time spent in these calls. 
 There are already incrementCounter() and logStartFunction() / 
 logStartTableFunction() ,etc calls in HiveMetaStore, and they could be 
 refactored/repurposed to make calls that expose JMX MBeans as well. Or, a 
 Metrics subsystem could be introduced which made calls to 
 incrementCounter()/etc as a refactor.
 It might also be possible to specify a -D parameter that the Metrics 
 subsystem could use to determine whether or not to be enabled, and if so, on 
 to what port. And once we have the capability to instrument and expose 
 MBeans, it might also be possible for other subsystems to also adopt and use 
 this system.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Build failed in Hudson: Hive-trunk-h0.20 #467

2011-01-04 Thread Apache Hudson Server
See https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/467/

--
[...truncated 14626 lines...]
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket23.txt
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt
[junit] Loading data to table src
[junit] POSTHOOK: Output: defa...@src
[junit] OK
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv3.txt
[junit] Loading data to table src1
[junit] POSTHOOK: Output: defa...@src1
[junit] OK
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.seq
[junit] Loading data to table src_sequencefile
[junit] POSTHOOK: Output: defa...@src_sequencefile
[junit] OK
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/complex.seq
[junit] Loading data to table src_thrift
[junit] POSTHOOK: Output: defa...@src_thrift
[junit] OK
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/json.txt
[junit] Loading data to table src_json
[junit] POSTHOOK: Output: defa...@src_json
[junit] OK
[junit] diff 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build/ql/test/logs/negative/unknown_table1.q.out
 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/ql/src/test/results/compiler/errors/unknown_table1.q.out
[junit] Done query: unknown_table1.q
[junit] Begin query: unknown_table2.q
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt
[junit] Loading data to table srcpart partition (ds=2008-04-08, hr=11)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=11
[junit] OK
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt
[junit] Loading data to table srcpart partition (ds=2008-04-08, hr=12)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=12
[junit] OK
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt
[junit] Loading data to table srcpart partition (ds=2008-04-09, hr=11)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=11
[junit] OK
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt
[junit] Loading data to table srcpart partition (ds=2008-04-09, hr=12)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=12
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket0.txt
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket1.txt
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket20.txt
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket21.txt
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket22.txt
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket23.txt
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt
[junit] Loading data to table src
[junit] POSTHOOK: Output: defa...@src
[junit] OK
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv3.txt
[junit] Loading data to table src1
[junit] POSTHOOK: Output: defa...@src1
[junit] OK
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.seq
[junit] Loading data to table 

[jira] Updated: (HIVE-1101) Extend Hive ODBC to support more functions

2011-01-04 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1101:
-

Component/s: (was: Drivers)
 ODBC

 Extend Hive ODBC to support more functions
 --

 Key: HIVE-1101
 URL: https://issues.apache.org/jira/browse/HIVE-1101
 Project: Hive
  Issue Type: New Feature
  Components: ODBC
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-1101.patch, unixODBC-2.2.14-p2-HIVE-1101.tar.gz


 Currently Hive ODBC driver only support a a minimum list of functions to 
 ensure some application work. Some other applications require more function 
 support. These functions include:
 *SQLCancel
 *SQLFetchScroll
 *SQLGetData   
 *SQLGetInfo
 *SQLMoreResults
 *SQLRowCount
 *SQLSetConnectAtt
 *SQLSetStmtAttr
 *SQLEndTran
 *SQLPrepare
 *SQLNumParams
 *SQLDescribeParam
 *SQLBindParameter
 *SQLGetConnectAttr
 *SQLSetEnvAttr
 *SQLPrimaryKeys (not ODBC API? Hive does not support primary keys yet)
 *SQLForeignKeys (not ODBC API? Hive does not support foreign keys yet)
 We should support as many of them as possible. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HIVE-1859) Hive's tinyint datatype is not supported by the Hive JDBC driver

2011-01-04 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach resolved HIVE-1859.
--

Resolution: Duplicate

 Hive's tinyint datatype is not supported by the Hive JDBC driver
 

 Key: HIVE-1859
 URL: https://issues.apache.org/jira/browse/HIVE-1859
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 0.5.0
 Environment: Create a Hive table containing a tinyint column.
 Then using the Hive JDBC driver execute a Hive query that selects data from 
 this table. 
 An error is then encountered.
Reporter: Guy le Mar

 java.sql.SQLException: Could not create ResultSet: 
 org.apache.hadoop.hive.serde2.dynamic_type.ParseException: Encountered byte 
 at line 1, column 47.
 Was expecting one of:
 bool ...
 i16 ...
 i32 ...
 i64 ...
 double ...
 string ...
 map ...
 list ...
 set ...
 required ...
 optional ...
 skip ...
 tok_int_constant ...
 IDENTIFIER ...
 } ...
 at 
 org.apache.hadoop.hive.jdbc.HiveResultSet.initDynamicSerde(HiveResultSet.java:120)
 at 
 org.apache.hadoop.hive.jdbc.HiveResultSet.init(HiveResultSet.java:74)
 at 
 org.apache.hadoop.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:178)
 at com.quest.orahive.HiveJdbcClient.main(HiveJdbcClient.java:117)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1860) Hive's smallint datatype is not supported by the Hive JDBC driver

2011-01-04 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1860:
-

Fix Version/s: 0.7.0

 Hive's smallint datatype is not supported by the Hive JDBC driver
 -

 Key: HIVE-1860
 URL: https://issues.apache.org/jira/browse/HIVE-1860
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 0.5.0
 Environment: Create a Hive table containing a smallint column.
 Then using the Hive JDBC driver execute a Hive query that selects data from 
 this table. 
 An error is then encountered.
Reporter: Guy le Mar
 Fix For: 0.7.0


 java.sql.SQLException: Inrecognized column type: i16
 at 
 org.apache.hadoop.hive.jdbc.HiveResultSetMetaData.getColumnType(HiveResultSetMetaData.java:132)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HIVE-1860) Hive's smallint datatype is not supported by the Hive JDBC driver

2011-01-04 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach resolved HIVE-1860.
--

Resolution: Duplicate

 Hive's smallint datatype is not supported by the Hive JDBC driver
 -

 Key: HIVE-1860
 URL: https://issues.apache.org/jira/browse/HIVE-1860
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 0.5.0
 Environment: Create a Hive table containing a smallint column.
 Then using the Hive JDBC driver execute a Hive query that selects data from 
 this table. 
 An error is then encountered.
Reporter: Guy le Mar
 Fix For: 0.7.0


 java.sql.SQLException: Inrecognized column type: i16
 at 
 org.apache.hadoop.hive.jdbc.HiveResultSetMetaData.getColumnType(HiveResultSetMetaData.java:132)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1477) Specific JDBC driver's jar

2011-01-04 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1477:
-

Component/s: (was: Drivers)
 JDBC

 Specific JDBC driver's jar
 --

 Key: HIVE-1477
 URL: https://issues.apache.org/jira/browse/HIVE-1477
 Project: Hive
  Issue Type: Improvement
  Components: JDBC
Reporter: Jerome Boulon

 Today we need to include Hadoop's jar on the client side installation but 
 since the JDBC driver is using Thrift a smaller jar with only Thrifts classes 
 should be enough.
 This should avoid distributing hadoop jar on the client side.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1381) Async cancel for JDBC connection.

2011-01-04 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1381:
-

Component/s: (was: Drivers)
 JDBC

 Async cancel for JDBC connection.
 -

 Key: HIVE-1381
 URL: https://issues.apache.org/jira/browse/HIVE-1381
 Project: Hive
  Issue Type: Improvement
  Components: JDBC
Affects Versions: 0.5.0
Reporter: Jerome Boulon



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HIVE-1052) Hive jdbc client - throws exception when query was too long

2011-01-04 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach resolved HIVE-1052.
--

Resolution: Won't Fix

We're not maintaining the 0.4.0 branch. Please upgrade to 0.6.0.
Resolving as wontfix.

 Hive jdbc client - throws exception when query was too long
 ---

 Key: HIVE-1052
 URL: https://issues.apache.org/jira/browse/HIVE-1052
 Project: Hive
  Issue Type: Bug
  Components: Drivers, Query Processor
Affects Versions: 0.4.0
Reporter: Vu Hoang

 I tried this query below
 {noformat}
 select columns from table where columnS='AAA' or columnS='BBB' or 
 columnS='CCC' or ... etc
 {noformat}
 it took a long time and throw exception after that
 ... for hive jdbc
 {noformat}
 FAILED: Unknown exception: org.apache.hadoop.security.AccessControlException: 
 org.apache.hadoop.security.AccessControlException: Permission denied: 
 user=DrWho, access=WRITE, inode=hive-asadm:asadm:asadm:rwxr-xr-x
 {noformat}
 ... for hive shell bash
 {noformat}
 FAILED: Parse Error: line 1:21 mismatched input 'from' expecting EOF ()
 {noformat}
 :(

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1555) JDBC Storage Handler

2011-01-04 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1555:
-

Component/s: (was: Drivers)
 JDBC

 JDBC Storage Handler
 

 Key: HIVE-1555
 URL: https://issues.apache.org/jira/browse/HIVE-1555
 Project: Hive
  Issue Type: New Feature
  Components: JDBC
Reporter: Bob Robertson
   Original Estimate: 24h
  Remaining Estimate: 24h

 With the Cassandra and HBase Storage Handlers I thought it would make sense 
 to include a generic JDBC RDBMS Storage Handler so that you could import a 
 standard DB table into Hive. Many people must want to perform HiveQL joins, 
 etc against tables in other systems etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-187) ODBC driver

2011-01-04 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-187:


Component/s: (was: Drivers)
 ODBC

 ODBC driver
 ---

 Key: HIVE-187
 URL: https://issues.apache.org/jira/browse/HIVE-187
 Project: Hive
  Issue Type: New Feature
  Components: Clients, ODBC
Reporter: Raghotham Murthy
Assignee: Eric Hwang
 Fix For: 0.4.0

 Attachments: HIVE-187.1.patch, HIVE-187.2.patch, HIVE-187.3.patch, 
 hive-187.4.patch, thrift_64.r790732.tgz, thrift_home_linux_32.tgz, 
 thrift_home_linux_64.tgz, unixODBC-2.2.14-1.tgz, unixODBC-2.2.14-2.tgz, 
 unixODBC-2.2.14-3.tgz, unixODBC-2.2.14-hive-patched.tar.gz, 
 unixODBC-2.2.14.tgz, unixodbc.patch


 We need to provide the a small number of functions to get basic query
 execution and retrieval of results. This is based on the tutorial provided
 here: http://www.easysoft.com/developer/languages/c/odbc_tutorial.html
  
 The minimum set of ODBC functions required are:
 SQLAllocHandle - for environment, connection, statement
 SQLSetEnvAttr
 SQLDriverConnect
 SQLExecDirect
 SQLNumResultCols
 SQLFetch
 SQLGetData
 SQLDisconnect
 SQLFreeHandle
  
 If required the plan would be to do the following:
 1. generate c++ client stubs for thrift server
 2. implement the required functions in c++ by calling the c++ client
 3. make the c++ functions in (2) extern C and then use those in the odbc
 SQL* functions
 4. provide a .so (in linux) which can be used by the ODBC clients.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-567) jdbc: integrate hive with pentaho report designer

2011-01-04 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-567:


Component/s: (was: Drivers)
 JDBC

 jdbc: integrate hive with pentaho report designer
 -

 Key: HIVE-567
 URL: https://issues.apache.org/jira/browse/HIVE-567
 Project: Hive
  Issue Type: Improvement
  Components: JDBC
Reporter: Raghotham Murthy
Assignee: Raghotham Murthy
 Fix For: 0.4.0

 Attachments: hive-567-server-output.txt, hive-567.1.patch, 
 hive-567.2.patch, hive-567.3.patch, hive-pentaho.tgz


 Instead of trying to get a complete implementation of jdbc, its probably more 
 useful to pick reporting/analytics software out there and implement the jdbc 
 methods necessary to get them working. This jira is a first attempt at this. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-679) Integrate JDBC driver with SQuirrelSQL for querying

2011-01-04 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-679:


Component/s: (was: Drivers)
 JDBC

 Integrate JDBC driver with SQuirrelSQL for querying
 ---

 Key: HIVE-679
 URL: https://issues.apache.org/jira/browse/HIVE-679
 Project: Hive
  Issue Type: New Feature
  Components: JDBC
Reporter: Bill Graham
Assignee: Bill Graham
 Fix For: 0.4.0

 Attachments: HIVE-679.1.patch, HIVE-679.2.branch-0.4.patch, 
 HIVE-679.2.patch


 Implement the JDBC methods required to support querying and other basic 
 commands using the SQuirrelSQL tool. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1126) Missing some Jdbc functionality like getTables getColumns and HiveResultSet.get* methods based on column name.

2011-01-04 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1126:
-

Component/s: (was: Drivers)
 JDBC

 Missing some Jdbc functionality like getTables getColumns and 
 HiveResultSet.get* methods based on column name.
 --

 Key: HIVE-1126
 URL: https://issues.apache.org/jira/browse/HIVE-1126
 Project: Hive
  Issue Type: Improvement
  Components: JDBC
Reporter: Bennie Schut
Assignee: Bennie Schut
 Fix For: 0.7.0

 Attachments: HIVE-1126-1.patch, HIVE-1126-2.patch, HIVE-1126-3.patch, 
 HIVE-1126-4.patch, HIVE-1126-5.patch, HIVE-1126-6.patch, HIVE-1126-7.patch, 
 HIVE-1126.patch, HIVE-1126_patch(0.5.0_source).patch


 I've been using the hive jdbc driver more and more and was missing some 
 functionality which I added
 HiveDatabaseMetaData.getTables
 Using show tables to get the info from hive.
 HiveDatabaseMetaData.getColumns
 Using describe tablename to get the columns.
 This makes using something like SQuirreL a lot nicer since you have the list 
 of tables and just click on the content tab to see what's in the table.
 I also implemented
 HiveResultSet.getObject(String columnName) so you call most get* methods 
 based on the column name.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1378) Return value for map, array, and struct needs to return a string

2011-01-04 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1378:
-

Component/s: (was: Drivers)
 JDBC

 Return value for map, array, and struct needs to return a string 
 -

 Key: HIVE-1378
 URL: https://issues.apache.org/jira/browse/HIVE-1378
 Project: Hive
  Issue Type: Improvement
  Components: JDBC
Reporter: Jerome Boulon
Assignee: Steven Wong
 Fix For: 0.7.0

 Attachments: HIVE-1378.1.patch, HIVE-1378.2.patch, HIVE-1378.3.patch, 
 HIVE-1378.4.patch, HIVE-1378.5.patch, HIVE-1378.6.patch, HIVE-1378.7.patch, 
 HIVE-1378.patch


 In order to be able to select/display any data from JDBC Hive driver, return 
 value for map, array, and struct needs to return a string

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1380) JDBC connection to be able to reattach to same session

2011-01-04 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1380:
-

Component/s: (was: Drivers)
 JDBC

 JDBC connection to be able to reattach to same session
 --

 Key: HIVE-1380
 URL: https://issues.apache.org/jira/browse/HIVE-1380
 Project: Hive
  Issue Type: Improvement
  Components: JDBC
Affects Versions: 0.5.0
Reporter: Jerome Boulon



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1688) In the MapJoinOperator, the code uses tag as alias, which is not always true

2011-01-04 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1688:
-

Component/s: (was: Drivers)
 Query Processor

 In the MapJoinOperator, the code uses tag as alias, which is not always true
 

 Key: HIVE-1688
 URL: https://issues.apache.org/jira/browse/HIVE-1688
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.6.0, 0.7.0
Reporter: Liyin Tang
Assignee: Liyin Tang
 Fix For: 0.7.0

   Original Estimate: 24h
  Remaining Estimate: 24h

 In the MapJoinOperator and SMBMapJoinOperator, the code uses tag as alias, 
 which is not always true.
 Actually, alias = order[tag]

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1815) The class HiveResultSet should implement batch fetching.

2011-01-04 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1815:
-

Component/s: (was: Drivers)
 JDBC

 The class HiveResultSet should implement batch fetching.
 

 Key: HIVE-1815
 URL: https://issues.apache.org/jira/browse/HIVE-1815
 Project: Hive
  Issue Type: Improvement
  Components: JDBC
Affects Versions: 0.5.0
 Environment: Custom Java application using the Hive JDBC driver to 
 connect to a Hive server, execute a Hive query and process the results.
Reporter: Guy le Mar
 Fix For: 0.6.0


 When using the Hive JDBC driver, you can execute a Hive query and obtain a 
 HiveResultSet instance that contains the results of the query.
 Unfortunately, HiveResultSet can then only fetch a single row of these 
 results from the Hive server at a time. As a consequence, it's extremely slow 
 to fetch a resultset of anything other than a trivial size.
 It would be nice for the HiveResultSet to be able to fetch N rows from the 
 server at a time, so that performance is suitable to support applications 
 that provide human interaction. 
 (From memory, I think it took me around 20 minutes to fetch 4000 rows.)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1816) Reporting of (seemingly inconsequential) transport exception has major impact on performance

2011-01-04 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1816:
-

Component/s: (was: Drivers)
 JDBC

 Reporting of (seemingly inconsequential) transport exception has major impact 
 on performance
 

 Key: HIVE-1816
 URL: https://issues.apache.org/jira/browse/HIVE-1816
 Project: Hive
  Issue Type: Improvement
  Components: JDBC
Affects Versions: 0.5.0
 Environment: Custom Java application using the Hive JDBC driver to 
 connect to a Hive server, execute a Hive query and process the results. 
Reporter: Guy le Mar
Priority: Minor

 During the process of executing a hive query and then fetching the results, 
 the following stack track is continually output to seterr.
 For the query I executed, 47Mb of this text was generated. As a consequence, 
 the performance of the application itself suffered.
 (Redirecting stderr to file halved the time it took my application to fetch 
 the results - from 2 minutes down to 70 sec.)
 Note, this also occurs if you use an application such as SQuirrel SQL 
 (http://www.squirrelsql.org) to execute a Hive query using the Hive JDBC 
 driver.
 The stack trace that is repeatedly reported is...
 org.apache.thrift.transport.TTransportException
   at 
 org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
   at 
 org.apache.hadoop.hive.serde2.thrift.TCTLSeparatedProtocol$SimpleTransportTokenizer.fillTokenizer(TCTLSeparatedProtocol.java:215)
   at 
 org.apache.hadoop.hive.serde2.thrift.TCTLSeparatedProtocol$SimpleTransportTokenizer.init(TCTLSeparatedProtocol.java:210)
   at 
 org.apache.hadoop.hive.serde2.thrift.TCTLSeparatedProtocol.internalInitialize(TCTLSeparatedProtocol.java:336)
   at 
 org.apache.hadoop.hive.serde2.thrift.TCTLSeparatedProtocol.initialize(TCTLSeparatedProtocol.java:417)
   at 
 org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe.initialize(DynamicSerDe.java:94)
   at 
 org.apache.hadoop.hive.jdbc.HiveResultSet.initDynamicSerde(HiveResultSet.java:117)
   at 
 org.apache.hadoop.hive.jdbc.HiveResultSet.init(HiveResultSet.java:74)
   at 
 org.apache.hadoop.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:178)
   at com.quest.orahive.HiveJdbcClient.main(HiveJdbcClient.java:117)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-143) Remove the old Metastore

2011-01-04 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-143:


Affects Version/s: (was: 0.6.0)
   (was: 0.4.0)

 Remove the old Metastore
 

 Key: HIVE-143
 URL: https://issues.apache.org/jira/browse/HIVE-143
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 0.3.0
Reporter: Johan Oskarsson
Assignee: Prasad Chakka
Priority: Minor
 Fix For: 0.4.0

 Attachments: hive-143.patch


 It is my understanding that there are two metastores, one HDFS based that 
 isn't being used anymore and one new based on SQL databases. This causes some 
 confusion and extra work for new developers, myself included. Am I correct in 
 thinking that the old Metastore won't be used anymore and could be removed?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1879) Remove hive.metastore.metadb.dir property from hive-default.xml and HiveConf

2011-01-04 Thread Carl Steinbach (JIRA)
Remove hive.metastore.metadb.dir property from hive-default.xml and HiveConf


 Key: HIVE-1879
 URL: https://issues.apache.org/jira/browse/HIVE-1879
 Project: Hive
  Issue Type: Bug
  Components: Configuration, Metastore
Reporter: Carl Steinbach
Assignee: Carl Steinbach


The file-based MetaStore implementation was removed in HIVE-143. We also need to
remove the hive.metastore.metadb.dir property from hive-default.xml and 
HiveConf, as well
as the references to this property that currently appear in HiveMetaStoreClient.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1643) support range scans and non-key columns in HBase filter pushdown

2011-01-04 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12977472#action_12977472
 ] 

John Sichi commented on HIVE-1643:
--

Notes for working on this:

Background is in

http://wiki.apache.org/hadoop/Hive/FilterPushdownDev

* In HiveHBaseTableInputFormat, newIndexPredicateAnalyzer needs to add 
additional operators (and stop restricting the allowed column names).  And then 
convertFilter needs to set up corresponding HBase filter conditions based on 
the predicates it finds.  Note that for inequality conditions on the key, it's 
necessary to muck with startRow/stopRow (not just the filter evaluator).

* See also the comment in HBaseStorageHandler.decomposePredicate.  Currently, 
it can only accept a single predicate.  If you want to be able to support AND 
of multiple predicates (using HBase's FilterList) then this will need to be 
relaxed.

* Beware of the fact that until HIVE-1538 gets committed, it is more difficult 
to make sure that the HBase-level filtering is working as expected.  The reason 
is that without HIVE-1538, a second copy of the filter gets applied within Hive 
(regardless of how the filter was decomposed when it was pushed down to HBase). 
 So even if HBase doesn't filter out everything you're expecting it to, you 
won't notice in the results since Hive will do the filtering again.


 support range scans and non-key columns in HBase filter pushdown
 

 Key: HIVE-1643
 URL: https://issues.apache.org/jira/browse/HIVE-1643
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Affects Versions: 0.7.0
Reporter: John Sichi
Assignee: John Sichi
 Fix For: 0.7.0


 HIVE-1226 added support for WHERE rowkey=3.  We would like to support WHERE 
 rowkey BETWEEN 10 and 20, as well as predicates on non-rowkeys (plus 
 conjunctions etc).  Non-rowkey conditions can't be used to filter out entire 
 ranges, but they can be used to push the per-row filter processing as far 
 down as possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1880) Hive should verify that entries in hive.metastore.uris are Thrift URIs

2011-01-04 Thread Carl Steinbach (JIRA)
Hive should verify that entries in hive.metastore.uris are Thrift URIs
--

 Key: HIVE-1880
 URL: https://issues.apache.org/jira/browse/HIVE-1880
 Project: Hive
  Issue Type: Bug
  Components: Configuration, Metastore
Reporter: Carl Steinbach
Assignee: Carl Steinbach


The hive.metastore.uris configuration property contains a list of Thrift URLs 
for remote
Thrift metastores. These values are used if the user has specified a non-local 
metastore
configuration by setting hive.metastore.local=true.

HiveMetaStoreClient.openStore(URI) currently makes the assumption that the URI 
is
a Thrift Binary Protocol endpoint. We should first check to make sure that the 
scheme
of the URI is thrift before attempting to open a Thrift binary connection to 
the host
and port specified in the URI.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1880) Hive should verify that entries in hive.metastore.uris are Thrift URIs

2011-01-04 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12977475#action_12977475
 ] 

Carl Steinbach commented on HIVE-1880:
--

It would also probably be useful to log a warning message if the configuration 
has
hive.metastore.local=true and values set for the hive.metastore.uris since in 
this case
the value of the latter property is ignored.

 Hive should verify that entries in hive.metastore.uris are Thrift URIs
 --

 Key: HIVE-1880
 URL: https://issues.apache.org/jira/browse/HIVE-1880
 Project: Hive
  Issue Type: Bug
  Components: Configuration, Metastore
Reporter: Carl Steinbach
Assignee: Carl Steinbach

 The hive.metastore.uris configuration property contains a list of Thrift URLs 
 for remote
 Thrift metastores. These values are used if the user has specified a 
 non-local metastore
 configuration by setting hive.metastore.local=true.
 HiveMetaStoreClient.openStore(URI) currently makes the assumption that the 
 URI is
 a Thrift Binary Protocol endpoint. We should first check to make sure that 
 the scheme
 of the URI is thrift before attempting to open a Thrift binary connection 
 to the host
 and port specified in the URI.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



January Hive Contributors Meeting

2011-01-04 Thread Carl Steinbach
Announcing a new Meetup for Hive Contributors Group!

*What*: January Hive Contributors
Meetinghttp://www.meetup.com/Hive-Contributors-Group/calendar/15919127/

*When*: Tuesday, January 11, 2011 5:00 PM

*Where*: Cloudera
210 Portage Ave
Palo Alto, CA 94306

The next Hive Contributors Meeting will convene on Tuesday January 11th at
5pm at Cloudera's offices in Palo Alto.

Please RSVP if you plan to attend this event.

RSVP to this Meetup:
http://www.meetup.com/Hive-Contributors-Group/calendar/15919127/


[jira] Created: (HIVE-1882) Remove CHANGES.txt

2011-01-04 Thread Carl Steinbach (JIRA)
Remove CHANGES.txt
--

 Key: HIVE-1882
 URL: https://issues.apache.org/jira/browse/HIVE-1882
 Project: Hive
  Issue Type: Task
  Components: Build Infrastructure
Reporter: Carl Steinbach
Assignee: Carl Steinbach


I propose that we remove the CHANGES.txt file for the following reasons:

* It's a headache to maintain.
* It contains a lot of errors.
* It's redundant since this information is available in JIRA and via source 
control.
* The RELEASE_NOTES.txt file now contains the same information auto-generated 
by JIRA. We should update this file as part of the release process instead of 
updating CHANGES.txt on every commit.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Remove CHANGES.txt from Hive build

2011-01-04 Thread Carl Steinbach
Hi,

I just filed HIVE-1882 which proposes removing CHANGES.txt from the Hive
build. I think the following points support this change:

* It's a headache to maintain.

* It contains a lot of errors.

* It's redundant since this information is available in JIRA and via
  source control.

* The RELEASE_NOTES.txt file contains the same information
  auto-generated using JIRA. We should update this file as part of the
  release process instead of updating CHANGES.txt on every commit.

I plan to bring this up at the contrib meeting on Tuesday, but wanted to
mention
it in advance so people have a chance to think about it.

Thanks.

Carl


[jira] Updated: (HIVE-1881) Add an option to use FsShell to delete dir in warehouse

2011-01-04 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1881:
-

Component/s: Metastore
Description: @Yongqiang: What's the motivation for doing this?

 Add an option to use FsShell to delete dir in warehouse
 ---

 Key: HIVE-1881
 URL: https://issues.apache.org/jira/browse/HIVE-1881
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: He Yongqiang
Assignee: He Yongqiang
 Attachments: HIVE-1881.1.patch


 @Yongqiang: What's the motivation for doing this?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1881) Add an option to use FsShell to delete dir in warehouse

2011-01-04 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12977555#action_12977555
 ] 

Carl Steinbach commented on HIVE-1881:
--

Review posted here: https://reviews.apache.org/r/210/


 Add an option to use FsShell to delete dir in warehouse
 ---

 Key: HIVE-1881
 URL: https://issues.apache.org/jira/browse/HIVE-1881
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: He Yongqiang
Assignee: He Yongqiang
 Attachments: HIVE-1881.1.patch


 @Yongqiang: What's the motivation for doing this?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Review Request: HIVE-1881: Add an option to use FsShell to delete dir in warehouse

2011-01-04 Thread Carl Steinbach

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/210/#review84
---



http://svn.apache.org/repos/asf/hive/trunk/common/src/java/org/apache/hadoop/hive/conf/SessionConfStore.java
https://reviews.apache.org/r/210/#comment154

I think this should be HiveConf instead of Configuration.



http://svn.apache.org/repos/asf/hive/trunk/common/src/java/org/apache/hadoop/hive/conf/SessionConfStore.java
https://reviews.apache.org/r/210/#comment153

I think you can simplify the interface by making getSessionConfStore() 
private,
and then calling it from getConf() and setConf() which can now be made 
static. Then
you'll be able to call

SessionConfStore.getConf()

instead of

SessionConfStore.getSessionConfStore().getConf()




http://svn.apache.org/repos/asf/hive/trunk/conf/hive-default.xml
https://reviews.apache.org/r/210/#comment155

I don't understand the motivation for this change, but assuming that
FsShell provides features unavailable in FileSystem, is there any reason
why we can't replace the FileSystem based implementation with the new
one that uses FsShell?



- Carl


On 2011-01-04 16:41:46, Carl Steinbach wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/210/
 ---
 
 (Updated 2011-01-04 16:41:46)
 
 
 Review request for hive.
 
 
 Summary
 ---
 
 Review 
 https://issues.apache.org/jira/secure/attachment/12467491/HIVE-1881.1.patch
 
 
 This addresses bug HIVE-1881.
 https://issues.apache.org/jira/browse/HIVE-1881
 
 
 Diffs
 -
 
   
 http://svn.apache.org/repos/asf/hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
  1055171 
   
 http://svn.apache.org/repos/asf/hive/trunk/common/src/java/org/apache/hadoop/hive/conf/SessionConfStore.java
  PRE-CREATION 
   http://svn.apache.org/repos/asf/hive/trunk/conf/hive-default.xml 1055171 
   
 http://svn.apache.org/repos/asf/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java
  1055171 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java
  1055171 
 
 Diff: https://reviews.apache.org/r/210/diff
 
 
 Testing
 ---
 
 
 Thanks,
 
 Carl
 




[jira] Updated: (HIVE-1862) Revive partition filtering in the Hive MetaStore

2011-01-04 Thread Mac Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mac Yang updated HIVE-1862:
---

Status: Patch Available  (was: Open)

 Revive partition filtering in the Hive MetaStore
 

 Key: HIVE-1862
 URL: https://issues.apache.org/jira/browse/HIVE-1862
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.7.0
Reporter: Devaraj Das
 Fix For: 0.7.0

 Attachments: HIVE-1862.1.patch.txt


 HIVE-1853 downgraded the JDO version. This makes the feature of partition 
 filtering in the metastore unusable. This jira is to keep track of the lost 
 feature and discussing approaches to bring it back.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1862) Revive partition filtering in the Hive MetaStore

2011-01-04 Thread Mac Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mac Yang updated HIVE-1862:
---

Attachment: HIVE-1862.1.patch.txt

Datanucleus 2.0.3 does not support the get() method on Collection, which the
partition filtering code depends on in order to retrieve the value for a
particular partition and use it for filtering.

The submitted patch is a quick work around. It uses the substring() function
to extract the partition value out of the partitionName field, and thus
eliminates the need of the get() method.

However, this approach does not work if the partition value contains special
characters. This is because the partitionName has the special characters
escaped. Hence the partition value generated using the substring() approach is
also in the escaped form. Here is the list of special characters for reference
purpose, '', '#', '%', '\'', '*', '/', ':', '=', '?', '\\', '\u007F', '{',
']'

While this solution is incomplete, I am hoping this submission will trigger 
more 
suggestions and ideas.

 Revive partition filtering in the Hive MetaStore
 

 Key: HIVE-1862
 URL: https://issues.apache.org/jira/browse/HIVE-1862
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.7.0
Reporter: Devaraj Das
 Fix For: 0.7.0

 Attachments: HIVE-1862.1.patch.txt


 HIVE-1853 downgraded the JDO version. This makes the feature of partition 
 filtering in the metastore unusable. This jira is to keep track of the lost 
 feature and discussing approaches to bring it back.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1862) Revive partition filtering in the Hive MetaStore

2011-01-04 Thread Mac Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12977583#action_12977583
 ] 

Mac Yang commented on HIVE-1862:


A quick note about the patch. The work around is implemented in 
ExpressionTree.java. The rest of the patch just added back old code that was 
removed as part of HIVE-1853.

 Revive partition filtering in the Hive MetaStore
 

 Key: HIVE-1862
 URL: https://issues.apache.org/jira/browse/HIVE-1862
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.7.0
Reporter: Devaraj Das
 Fix For: 0.7.0

 Attachments: HIVE-1862.1.patch.txt


 HIVE-1853 downgraded the JDO version. This makes the feature of partition 
 filtering in the metastore unusable. This jira is to keep track of the lost 
 feature and discussing approaches to bring it back.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1881) Add an option to use FsShell to delete dir in warehouse

2011-01-04 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1881:
---

Description: 
@Yongqiang: What's the motivation for doing this?
This is to work with some internal hacky codes about doing delete. There should 
be no impact if you use open source hadoop.

But the idea here is to give users 2 options to do the delete. In Facebook, we 
have some customized code in FsShell which can control whether the delete 
should go through trash or not.

  was:@Yongqiang: What's the motivation for doing this?


 Add an option to use FsShell to delete dir in warehouse
 ---

 Key: HIVE-1881
 URL: https://issues.apache.org/jira/browse/HIVE-1881
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: He Yongqiang
Assignee: He Yongqiang
 Attachments: HIVE-1881.1.patch


 @Yongqiang: What's the motivation for doing this?
 This is to work with some internal hacky codes about doing delete. There 
 should be no impact if you use open source hadoop.
 But the idea here is to give users 2 options to do the delete. In Facebook, 
 we have some customized code in FsShell which can control whether the delete 
 should go through trash or not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1151) Add 'show version' command to Hive CLI

2011-01-04 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1151:
-

Component/s: CLI

 Add 'show version' command to Hive CLI
 --

 Key: HIVE-1151
 URL: https://issues.apache.org/jira/browse/HIVE-1151
 Project: Hive
  Issue Type: New Feature
  Components: CLI, Clients
Affects Versions: 0.6.0
Reporter: Carl Steinbach
Assignee: Carl Steinbach

 At a minimum this command should return the version information obtained
 from the hive-cli jar. Ideally this command will also return version 
 information
 obtained from each of the hive jar files present in the CLASSPATH, which
 will allow us to quickly detect cases where people are using incompatible
 jars.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1881) Add an option to use FsShell to delete dir in warehouse

2011-01-04 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12977591#action_12977591
 ] 

He Yongqiang commented on HIVE-1881:


{quote}
I don't understand the motivation for this change, but assuming that
FsShell provides features unavailable in FileSystem, is there any reason
why we can't replace the FileSystem based implementation with the new
one that uses FsShell?
{quote}

Yeah, we can replace it completely. But there is an overhead of using FsShell 
since it requires a new process. We just want to go to the new code path only 
needed. For normal ones, just keep the old behavior.

{quote}
I think you can simplify the interface by making getSessionConfStore() private,
and then calling it from getConf() and setConf() which can now be made static. 
Then
you'll be able to call

SessionConfStore.getConf()

instead of

SessionConfStore.getSessionConfStore().getConf()
{quote}


will do it and upload a new patch. 

Thanks!

 Add an option to use FsShell to delete dir in warehouse
 ---

 Key: HIVE-1881
 URL: https://issues.apache.org/jira/browse/HIVE-1881
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: He Yongqiang
Assignee: He Yongqiang
 Attachments: HIVE-1881.1.patch


 @Yongqiang: What's the motivation for doing this?
 This is to work with some internal hacky codes about doing delete. There 
 should be no impact if you use open source hadoop.
 But the idea here is to give users 2 options to do the delete. In Facebook, 
 we have some customized code in FsShell which can control whether the delete 
 should go through trash or not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1881) Add an option to use FsShell to delete dir in warehouse

2011-01-04 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12977594#action_12977594
 ] 

Carl Steinbach commented on HIVE-1881:
--

I'm concerned that this patch introduces two new configuration properties that 
don't make sense
to anyone outside of Facebook. I think we need to avoid doing this since it 
makes the configuration
process more complicated (it's already complicated enough), and also introduces 
an untested 
code path.

Instead, I'd like to propose that we define a MetaStoreFs interface that 
defines createDir and
deleteDir methods, etc, along with a default implementation and the ability to 
plug in other
implementations by setting a new hive.metastore.fs.impl configuration property.

What do you think?

 Add an option to use FsShell to delete dir in warehouse
 ---

 Key: HIVE-1881
 URL: https://issues.apache.org/jira/browse/HIVE-1881
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: He Yongqiang
Assignee: He Yongqiang
 Attachments: HIVE-1881.1.patch


 @Yongqiang: What's the motivation for doing this?
 This is to work with some internal hacky codes about doing delete. There 
 should be no impact if you use open source hadoop.
 But the idea here is to give users 2 options to do the delete. In Facebook, 
 we have some customized code in FsShell which can control whether the delete 
 should go through trash or not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1881) Add an option to use FsShell to delete dir in warehouse

2011-01-04 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1881:
---

Attachment: HIVE-1881.2.patch

 Add an option to use FsShell to delete dir in warehouse
 ---

 Key: HIVE-1881
 URL: https://issues.apache.org/jira/browse/HIVE-1881
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: He Yongqiang
Assignee: He Yongqiang
 Attachments: HIVE-1881.1.patch, HIVE-1881.2.patch


 @Yongqiang: What's the motivation for doing this?
 This is to work with some internal hacky codes about doing delete. There 
 should be no impact if you use open source hadoop.
 But the idea here is to give users 2 options to do the delete. In Facebook, 
 we have some customized code in FsShell which can control whether the delete 
 should go through trash or not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: Regarding Hive History File(s).

2011-01-04 Thread Mohit
hmm, ok , I think the process of creating and cleanup of resources should be
the part of the same system, lets not hand it over to cron utility, users
might not be knowing or need not to know what files to delete, when to
delete, from where to delete.

 

What about a timer task which cleans up these files older than the
configured elapsed time say a deleting files an hour old or a week old.?

 

I'm raising new JIRA for this and will provide the patch.

 

Ok, you are talking about HIVE-1708, WELL If it is about changing the file
location, one can do that by overriding the property hive.querylog.location
by adding into hive-default.xml. I will comment on that.

 

 

-Mohit


***

This e-mail and attachments contain confidential information from HUAWEI,
which is intended only for the person or entity whose address is listed
above. Any use of the information contained herein in any way (including,
but not limited to, total or partial disclosure, reproduction, or
dissemination) by persons other than the intended recipient's) is
prohibited. If you receive this e-mail in error, please notify the sender by
phone or email immediately and delete it!

 

-Original Message-
From: Edward Capriolo [mailto:edlinuxg...@gmail.com] 
Sent: Tuesday, January 04, 2011 8:03 PM
To: mohitsi...@huawei.com
Cc: hive-...@hadoop.apache.org; c...@cloudera.com
Subject: Re: Regarding Hive History File(s).

 

On Tue, Jan 4, 2011 at 7:03 AM, Mohit mohitsi...@huawei.com wrote:

 Hello All,

 

 

 

 What is the purpose of maintaining hive history files which contain
session

 information like session start, query start, query end, task start, task
end

 etc.? Are they being used later (say by a tool) for some purpose?

 

 

 

 I don't see these files being getting deleted from the system ;any

 configuration needed to be set  to enable deletion or Is there any design

 strategy/decision/rationale for not deleting them at all?

 

 

 

 Also, in these files I don't see the session end message being logged, is
it

 reserved for future use?

 

 

 

 -Mohit

 

 

 



***

 This e-mail and attachments contain confidential information from HUAWEI,

 which is intended only for the person or entity whose address is listed

 above. Any use of the information contained herein in any way (including,

 but not limited to, total or partial disclosure, reproduction, or

 dissemination) by persons other than the intended recipient's) is

 prohibited. If you receive this e-mail in error, please notify the sender
by

 phone or email immediately and delete it!

 

 

 

HiveHistory was added a while ago between 3.0 and 4.0 (iirc). A tool

to view them is HiveHistoryViewer in the API. I am not exactly sure

who is doing what with that data. The Web Interface does use it to

provide links to the JobTracker. So it helpful for trying to trace all

the dependant jobs of a query after the fact.

 

There is a ticket open to customize the file location. I was also

thinking we should allow the user to supply a 'none' to turn off the

feature. As for clean up and management cron and rm seem like a good

fit.



[jira] Commented: (HIVE-1708) make hive history file configurable

2011-01-04 Thread Mohit Sikri (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12977633#action_12977633
 ] 

Mohit Sikri commented on HIVE-1708:
---

well it's not the case I observed, I added the below property

property
   namehive.querylog.location/name
   value/tmp/tansactionhist/value
   descriptionLocation for the hive query log. Default value is 
/tmp/${user.name}/description
/property

into hive-default.xml , it is creating files under /tmp/transactionhist 
directory.
Kindly confirm once. 

 make hive history file configurable
 ---

 Key: HIVE-1708
 URL: https://issues.apache.org/jira/browse/HIVE-1708
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain

 Currentlly, it is derived from
 System.getProperty(user.home)/.hivehistory;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1883) Periodic cleanup of Hive History log files.

2011-01-04 Thread Mohit Sikri (JIRA)
Periodic cleanup of Hive History log files.
---

 Key: HIVE-1883
 URL: https://issues.apache.org/jira/browse/HIVE-1883
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.6.0
 Environment: Hive 0.6.0,  Hadoop 0.20.1

SUSE Linux Enterprise Server 11 (i586)
VERSION = 11
PATCHLEVEL = 0

Reporter: Mohit Sikri


After starting hive and running queries transaction history files are getting 
creating in the /tmp/root folder.
These files we should remove periodically(not all of them but) which are too 
old to represent any significant information.

Solution :-
A scheduled timer task, which cleans up the log files older than the configured 
time.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1840) Support ALTER DATABASE to change database properties

2011-01-04 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1840:
-

Attachment: HIVE-1840.patch

 Support ALTER DATABASE to change database properties
 

 Key: HIVE-1840
 URL: https://issues.apache.org/jira/browse/HIVE-1840
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Ning Zhang
 Attachments: HIVE-1840.patch


 This is a follow-up to HIVE-1836

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1840) Support ALTER DATABASE to change database properties

2011-01-04 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1840:
-

Status: Patch Available  (was: Open)

 Support ALTER DATABASE to change database properties
 

 Key: HIVE-1840
 URL: https://issues.apache.org/jira/browse/HIVE-1840
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Ning Zhang
 Attachments: HIVE-1840.patch


 This is a follow-up to HIVE-1836

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Regarding Hive History File(s).

2011-01-04 Thread Carl Steinbach
Hi Mohit,

Usually it's the Ops/IT staff that ends up managing things like a production
HiveServer instance, and in a UNIX shop I suspect that most of these folks
are already going to be familiar with using cron and logrotate (
http://linuxcommand.org/man_pages/logrotate8.html) to manage the logs
produced by their other server systems.

Building a log rotation feature into HiveServer defies this convention and
will force people to learn how to configure a new log rotation system
specific to HiveServer. It also requires us to write, debug, document and
maintain code that isn't really necessary. I think the best approach is to
take advantage of what already exists by documenting Hive's logging behavior
in the Admin manual and providing a sample logrotate configuration file.

Thanks.

Carl

On Tue, Jan 4, 2011 at 9:41 PM, Mohit mohitsi...@huawei.com wrote:

  hmm, ok , I think the process of creating and cleanup of resources should
 be the part of the same system, lets not hand it over to cron utility, users
 might not be knowing or need not to know what files to delete, when to
 delete, from where to delete.



 What about a timer task which cleans up these files older than the
 configured elapsed time say a deleting files an hour old or a week old.?



 I'm raising new JIRA for this and will provide the patch.



 Ok, you are talking about HIVE-1708, WELL If it is about changing the file
 location, one can do that by overriding the property *hive.querylog.location
 *by adding into hive-default.xml. I will comment on that.





 -Mohit


 ***

 This e-mail and attachments contain confidential information from HUAWEI,
 which is intended only for the person or entity whose address is listed
 above. Any use of the information contained herein in any way (including,
 but not limited to, total or partial disclosure, reproduction, or
 dissemination) by persons other than the intended recipient's) is
 prohibited. If you receive this e-mail in error, please notify the sender by
 phone or email immediately and delete it!



 -Original Message-
 From: Edward Capriolo [mailto:edlinuxg...@gmail.com]
 Sent: Tuesday, January 04, 2011 8:03 PM
 To: mohitsi...@huawei.com
 Cc: hive-...@hadoop.apache.org; c...@cloudera.com
 Subject: Re: Regarding Hive History File(s).



 On Tue, Jan 4, 2011 at 7:03 AM, Mohit mohitsi...@huawei.com wrote:

  Hello All,

 

 

 

  What is the purpose of maintaining hive history files which contain
 session

  information like session start, query start, query end, task start, task
 end

  etc.? Are they being used later (say by a tool) for some purpose?

 

 

 

  I don't see these files being getting deleted from the system ;any

  configuration needed to be set  to enable deletion or Is there any design

  strategy/decision/rationale for not deleting them at all?

 

 

 

  Also, in these files I don't see the session end message being logged, is
 it

  reserved for future use?

 

 

 

  -Mohit

 

 

 

 
 ***

  This e-mail and attachments contain confidential information from HUAWEI,

  which is intended only for the person or entity whose address is listed

  above. Any use of the information contained herein in any way (including,

  but not limited to, total or partial disclosure, reproduction, or

  dissemination) by persons other than the intended recipient's) is

  prohibited. If you receive this e-mail in error, please notify the sender
 by

  phone or email immediately and delete it!

 

 



 HiveHistory was added a while ago between 3.0 and 4.0 (iirc). A tool

 to view them is HiveHistoryViewer in the API. I am not exactly sure

 who is doing what with that data. The Web Interface does use it to

 provide links to the JobTracker. So it helpful for trying to trace all

 the dependant jobs of a query after the fact.



 There is a ticket open to customize the file location. I was also

 thinking we should allow the user to supply a 'none' to turn off the

 feature. As for clean up and management cron and rm seem like a good

 fit.



[jira] Commented: (HIVE-1881) Add an option to use FsShell to delete dir in warehouse

2011-01-04 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12977666#action_12977666
 ] 

Namit Jain commented on HIVE-1881:
--

Talking offline with Yongqiang, the facebook specific implementation of this 
interface need not be checked in
open source, nor is there any need to document the new configuration parameter 
in open source, since this
parameter only makes sense in the facebook enviroment

 Add an option to use FsShell to delete dir in warehouse
 ---

 Key: HIVE-1881
 URL: https://issues.apache.org/jira/browse/HIVE-1881
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: He Yongqiang
Assignee: He Yongqiang
 Attachments: HIVE-1881.1.patch, HIVE-1881.2.patch


 @Yongqiang: What's the motivation for doing this?
 This is to work with some internal hacky codes about doing delete. There 
 should be no impact if you use open source hadoop.
 But the idea here is to give users 2 options to do the delete. In Facebook, 
 we have some customized code in FsShell which can control whether the delete 
 should go through trash or not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.