[jira] Commented: (HIVE-1508) Add cleanup method to HiveHistory class
[ https://issues.apache.org/jira/browse/HIVE-1508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12977185#action_12977185 ] Mohit Sikri commented on HIVE-1508: --- Great finding, but how do we curtail(or put the limit to) the number of these files generated, since I don't see any mechanism of deleting such files (say deleting the files older than an hour or so), they may eventually pile up to the extent of consuming significant disk space. I don't see any significance of these files once session expires or keeping too old session information within such files. Kindly suggest. Add cleanup method to HiveHistory class --- Key: HIVE-1508 URL: https://issues.apache.org/jira/browse/HIVE-1508 Project: Hive Issue Type: Bug Components: Metastore, Server Infrastructure Reporter: Anurag Phadke Assignee: Edward Capriolo Priority: Blocker Fix For: 0.7.0 Attachments: hive-1508-1-patch.txt Running hive server for long time 90 minutes results in too many open file-handles, eventually causing the server to crash as the server runs out of file handle. Actual bug as described by Carl Steinbach: the hive_job_log_* files are created by the HiveHistory class. This class creates a PrintWriter for writing to the file, but never closes the writer. It looks like we need to add a cleanup method to HiveHistory that closes the PrintWriter and does any other necessary cleanup. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Regarding Hive History File(s).
Hello All, What is the purpose of maintaining hive history files which contain session information like session start, query start, query end, task start, task end etc.? Are they being used later (say by a tool) for some purpose? I don't see these files being getting deleted from the system ;any configuration needed to be set to enable deletion or Is there any design strategy/decision/rationale for not deleting them at all? Also, in these files I don't see the session end message being logged, is it reserved for future use? -Mohit *** This e-mail and attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient's) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!
Re: Regarding Hive History File(s).
On Tue, Jan 4, 2011 at 7:03 AM, Mohit mohitsi...@huawei.com wrote: Hello All, What is the purpose of maintaining hive history files which contain session information like session start, query start, query end, task start, task end etc.? Are they being used later (say by a tool) for some purpose? I don't see these files being getting deleted from the system ;any configuration needed to be set to enable deletion or Is there any design strategy/decision/rationale for not deleting them at all? Also, in these files I don't see the session end message being logged, is it reserved for future use? -Mohit *** This e-mail and attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient's) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it! HiveHistory was added a while ago between 3.0 and 4.0 (iirc). A tool to view them is HiveHistoryViewer in the API. I am not exactly sure who is doing what with that data. The Web Interface does use it to provide links to the JobTracker. So it helpful for trying to trace all the dependant jobs of a query after the fact. There is a ticket open to customize the file location. I was also thinking we should allow the user to supply a 'none' to turn off the feature. As for clean up and management cron and rm seem like a good fit.
[jira] Updated: (HIVE-1818) Call frequency and duration metrics for HiveMetaStore via jmx
[ https://issues.apache.org/jira/browse/HIVE-1818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-1818: --- Fix Version/s: 0.7.0 Status: Patch Available (was: Open) Call frequency and duration metrics for HiveMetaStore via jmx - Key: HIVE-1818 URL: https://issues.apache.org/jira/browse/HIVE-1818 Project: Hive Issue Type: New Feature Components: Metastore Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Priority: Minor Fix For: 0.7.0 Attachments: HIVE-1818-vs-1054860.patch, HIVE-1818.patch As recently brought up in the hive-dev mailing list, it'd be useful if the HiveMetaStore had some sort of instrumentation capability so as to measure frequency of calls to various calls on the HiveMetaStore and the duration of time spent in these calls. There are already incrementCounter() and logStartFunction() / logStartTableFunction() ,etc calls in HiveMetaStore, and they could be refactored/repurposed to make calls that expose JMX MBeans as well. Or, a Metrics subsystem could be introduced which made calls to incrementCounter()/etc as a refactor. It might also be possible to specify a -D parameter that the Metrics subsystem could use to determine whether or not to be enabled, and if so, on to what port. And once we have the capability to instrument and expose MBeans, it might also be possible for other subsystems to also adopt and use this system. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Build failed in Hudson: Hive-trunk-h0.20 #467
See https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/467/ -- [...truncated 14626 lines...] [junit] POSTHOOK: Output: defa...@srcbucket2 [junit] OK [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket23.txt [junit] Loading data to table srcbucket2 [junit] POSTHOOK: Output: defa...@srcbucket2 [junit] OK [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt [junit] Loading data to table src [junit] POSTHOOK: Output: defa...@src [junit] OK [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv3.txt [junit] Loading data to table src1 [junit] POSTHOOK: Output: defa...@src1 [junit] OK [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.seq [junit] Loading data to table src_sequencefile [junit] POSTHOOK: Output: defa...@src_sequencefile [junit] OK [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/complex.seq [junit] Loading data to table src_thrift [junit] POSTHOOK: Output: defa...@src_thrift [junit] OK [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/json.txt [junit] Loading data to table src_json [junit] POSTHOOK: Output: defa...@src_json [junit] OK [junit] diff https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build/ql/test/logs/negative/unknown_table1.q.out https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/ql/src/test/results/compiler/errors/unknown_table1.q.out [junit] Done query: unknown_table1.q [junit] Begin query: unknown_table2.q [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt [junit] Loading data to table srcpart partition (ds=2008-04-08, hr=11) [junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=11 [junit] OK [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt [junit] Loading data to table srcpart partition (ds=2008-04-08, hr=12) [junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=12 [junit] OK [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt [junit] Loading data to table srcpart partition (ds=2008-04-09, hr=11) [junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=11 [junit] OK [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt [junit] Loading data to table srcpart partition (ds=2008-04-09, hr=12) [junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=12 [junit] OK [junit] POSTHOOK: Output: defa...@srcbucket [junit] OK [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket0.txt [junit] Loading data to table srcbucket [junit] POSTHOOK: Output: defa...@srcbucket [junit] OK [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket1.txt [junit] Loading data to table srcbucket [junit] POSTHOOK: Output: defa...@srcbucket [junit] OK [junit] POSTHOOK: Output: defa...@srcbucket2 [junit] OK [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket20.txt [junit] Loading data to table srcbucket2 [junit] POSTHOOK: Output: defa...@srcbucket2 [junit] OK [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket21.txt [junit] Loading data to table srcbucket2 [junit] POSTHOOK: Output: defa...@srcbucket2 [junit] OK [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket22.txt [junit] Loading data to table srcbucket2 [junit] POSTHOOK: Output: defa...@srcbucket2 [junit] OK [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket23.txt [junit] Loading data to table srcbucket2 [junit] POSTHOOK: Output: defa...@srcbucket2 [junit] OK [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt [junit] Loading data to table src [junit] POSTHOOK: Output: defa...@src [junit] OK [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv3.txt [junit] Loading data to table src1 [junit] POSTHOOK: Output: defa...@src1 [junit] OK [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.seq [junit] Loading data to table
[jira] Updated: (HIVE-1101) Extend Hive ODBC to support more functions
[ https://issues.apache.org/jira/browse/HIVE-1101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-1101: - Component/s: (was: Drivers) ODBC Extend Hive ODBC to support more functions -- Key: HIVE-1101 URL: https://issues.apache.org/jira/browse/HIVE-1101 Project: Hive Issue Type: New Feature Components: ODBC Reporter: Ning Zhang Assignee: Ning Zhang Attachments: HIVE-1101.patch, unixODBC-2.2.14-p2-HIVE-1101.tar.gz Currently Hive ODBC driver only support a a minimum list of functions to ensure some application work. Some other applications require more function support. These functions include: *SQLCancel *SQLFetchScroll *SQLGetData *SQLGetInfo *SQLMoreResults *SQLRowCount *SQLSetConnectAtt *SQLSetStmtAttr *SQLEndTran *SQLPrepare *SQLNumParams *SQLDescribeParam *SQLBindParameter *SQLGetConnectAttr *SQLSetEnvAttr *SQLPrimaryKeys (not ODBC API? Hive does not support primary keys yet) *SQLForeignKeys (not ODBC API? Hive does not support foreign keys yet) We should support as many of them as possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HIVE-1859) Hive's tinyint datatype is not supported by the Hive JDBC driver
[ https://issues.apache.org/jira/browse/HIVE-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach resolved HIVE-1859. -- Resolution: Duplicate Hive's tinyint datatype is not supported by the Hive JDBC driver Key: HIVE-1859 URL: https://issues.apache.org/jira/browse/HIVE-1859 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 0.5.0 Environment: Create a Hive table containing a tinyint column. Then using the Hive JDBC driver execute a Hive query that selects data from this table. An error is then encountered. Reporter: Guy le Mar java.sql.SQLException: Could not create ResultSet: org.apache.hadoop.hive.serde2.dynamic_type.ParseException: Encountered byte at line 1, column 47. Was expecting one of: bool ... i16 ... i32 ... i64 ... double ... string ... map ... list ... set ... required ... optional ... skip ... tok_int_constant ... IDENTIFIER ... } ... at org.apache.hadoop.hive.jdbc.HiveResultSet.initDynamicSerde(HiveResultSet.java:120) at org.apache.hadoop.hive.jdbc.HiveResultSet.init(HiveResultSet.java:74) at org.apache.hadoop.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:178) at com.quest.orahive.HiveJdbcClient.main(HiveJdbcClient.java:117) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1860) Hive's smallint datatype is not supported by the Hive JDBC driver
[ https://issues.apache.org/jira/browse/HIVE-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-1860: - Fix Version/s: 0.7.0 Hive's smallint datatype is not supported by the Hive JDBC driver - Key: HIVE-1860 URL: https://issues.apache.org/jira/browse/HIVE-1860 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 0.5.0 Environment: Create a Hive table containing a smallint column. Then using the Hive JDBC driver execute a Hive query that selects data from this table. An error is then encountered. Reporter: Guy le Mar Fix For: 0.7.0 java.sql.SQLException: Inrecognized column type: i16 at org.apache.hadoop.hive.jdbc.HiveResultSetMetaData.getColumnType(HiveResultSetMetaData.java:132) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HIVE-1860) Hive's smallint datatype is not supported by the Hive JDBC driver
[ https://issues.apache.org/jira/browse/HIVE-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach resolved HIVE-1860. -- Resolution: Duplicate Hive's smallint datatype is not supported by the Hive JDBC driver - Key: HIVE-1860 URL: https://issues.apache.org/jira/browse/HIVE-1860 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 0.5.0 Environment: Create a Hive table containing a smallint column. Then using the Hive JDBC driver execute a Hive query that selects data from this table. An error is then encountered. Reporter: Guy le Mar Fix For: 0.7.0 java.sql.SQLException: Inrecognized column type: i16 at org.apache.hadoop.hive.jdbc.HiveResultSetMetaData.getColumnType(HiveResultSetMetaData.java:132) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1477) Specific JDBC driver's jar
[ https://issues.apache.org/jira/browse/HIVE-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-1477: - Component/s: (was: Drivers) JDBC Specific JDBC driver's jar -- Key: HIVE-1477 URL: https://issues.apache.org/jira/browse/HIVE-1477 Project: Hive Issue Type: Improvement Components: JDBC Reporter: Jerome Boulon Today we need to include Hadoop's jar on the client side installation but since the JDBC driver is using Thrift a smaller jar with only Thrifts classes should be enough. This should avoid distributing hadoop jar on the client side. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1381) Async cancel for JDBC connection.
[ https://issues.apache.org/jira/browse/HIVE-1381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-1381: - Component/s: (was: Drivers) JDBC Async cancel for JDBC connection. - Key: HIVE-1381 URL: https://issues.apache.org/jira/browse/HIVE-1381 Project: Hive Issue Type: Improvement Components: JDBC Affects Versions: 0.5.0 Reporter: Jerome Boulon -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HIVE-1052) Hive jdbc client - throws exception when query was too long
[ https://issues.apache.org/jira/browse/HIVE-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach resolved HIVE-1052. -- Resolution: Won't Fix We're not maintaining the 0.4.0 branch. Please upgrade to 0.6.0. Resolving as wontfix. Hive jdbc client - throws exception when query was too long --- Key: HIVE-1052 URL: https://issues.apache.org/jira/browse/HIVE-1052 Project: Hive Issue Type: Bug Components: Drivers, Query Processor Affects Versions: 0.4.0 Reporter: Vu Hoang I tried this query below {noformat} select columns from table where columnS='AAA' or columnS='BBB' or columnS='CCC' or ... etc {noformat} it took a long time and throw exception after that ... for hive jdbc {noformat} FAILED: Unknown exception: org.apache.hadoop.security.AccessControlException: org.apache.hadoop.security.AccessControlException: Permission denied: user=DrWho, access=WRITE, inode=hive-asadm:asadm:asadm:rwxr-xr-x {noformat} ... for hive shell bash {noformat} FAILED: Parse Error: line 1:21 mismatched input 'from' expecting EOF () {noformat} :( -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1555) JDBC Storage Handler
[ https://issues.apache.org/jira/browse/HIVE-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-1555: - Component/s: (was: Drivers) JDBC JDBC Storage Handler Key: HIVE-1555 URL: https://issues.apache.org/jira/browse/HIVE-1555 Project: Hive Issue Type: New Feature Components: JDBC Reporter: Bob Robertson Original Estimate: 24h Remaining Estimate: 24h With the Cassandra and HBase Storage Handlers I thought it would make sense to include a generic JDBC RDBMS Storage Handler so that you could import a standard DB table into Hive. Many people must want to perform HiveQL joins, etc against tables in other systems etc. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-187) ODBC driver
[ https://issues.apache.org/jira/browse/HIVE-187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-187: Component/s: (was: Drivers) ODBC ODBC driver --- Key: HIVE-187 URL: https://issues.apache.org/jira/browse/HIVE-187 Project: Hive Issue Type: New Feature Components: Clients, ODBC Reporter: Raghotham Murthy Assignee: Eric Hwang Fix For: 0.4.0 Attachments: HIVE-187.1.patch, HIVE-187.2.patch, HIVE-187.3.patch, hive-187.4.patch, thrift_64.r790732.tgz, thrift_home_linux_32.tgz, thrift_home_linux_64.tgz, unixODBC-2.2.14-1.tgz, unixODBC-2.2.14-2.tgz, unixODBC-2.2.14-3.tgz, unixODBC-2.2.14-hive-patched.tar.gz, unixODBC-2.2.14.tgz, unixodbc.patch We need to provide the a small number of functions to get basic query execution and retrieval of results. This is based on the tutorial provided here: http://www.easysoft.com/developer/languages/c/odbc_tutorial.html The minimum set of ODBC functions required are: SQLAllocHandle - for environment, connection, statement SQLSetEnvAttr SQLDriverConnect SQLExecDirect SQLNumResultCols SQLFetch SQLGetData SQLDisconnect SQLFreeHandle If required the plan would be to do the following: 1. generate c++ client stubs for thrift server 2. implement the required functions in c++ by calling the c++ client 3. make the c++ functions in (2) extern C and then use those in the odbc SQL* functions 4. provide a .so (in linux) which can be used by the ODBC clients. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-567) jdbc: integrate hive with pentaho report designer
[ https://issues.apache.org/jira/browse/HIVE-567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-567: Component/s: (was: Drivers) JDBC jdbc: integrate hive with pentaho report designer - Key: HIVE-567 URL: https://issues.apache.org/jira/browse/HIVE-567 Project: Hive Issue Type: Improvement Components: JDBC Reporter: Raghotham Murthy Assignee: Raghotham Murthy Fix For: 0.4.0 Attachments: hive-567-server-output.txt, hive-567.1.patch, hive-567.2.patch, hive-567.3.patch, hive-pentaho.tgz Instead of trying to get a complete implementation of jdbc, its probably more useful to pick reporting/analytics software out there and implement the jdbc methods necessary to get them working. This jira is a first attempt at this. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-679) Integrate JDBC driver with SQuirrelSQL for querying
[ https://issues.apache.org/jira/browse/HIVE-679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-679: Component/s: (was: Drivers) JDBC Integrate JDBC driver with SQuirrelSQL for querying --- Key: HIVE-679 URL: https://issues.apache.org/jira/browse/HIVE-679 Project: Hive Issue Type: New Feature Components: JDBC Reporter: Bill Graham Assignee: Bill Graham Fix For: 0.4.0 Attachments: HIVE-679.1.patch, HIVE-679.2.branch-0.4.patch, HIVE-679.2.patch Implement the JDBC methods required to support querying and other basic commands using the SQuirrelSQL tool. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1126) Missing some Jdbc functionality like getTables getColumns and HiveResultSet.get* methods based on column name.
[ https://issues.apache.org/jira/browse/HIVE-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-1126: - Component/s: (was: Drivers) JDBC Missing some Jdbc functionality like getTables getColumns and HiveResultSet.get* methods based on column name. -- Key: HIVE-1126 URL: https://issues.apache.org/jira/browse/HIVE-1126 Project: Hive Issue Type: Improvement Components: JDBC Reporter: Bennie Schut Assignee: Bennie Schut Fix For: 0.7.0 Attachments: HIVE-1126-1.patch, HIVE-1126-2.patch, HIVE-1126-3.patch, HIVE-1126-4.patch, HIVE-1126-5.patch, HIVE-1126-6.patch, HIVE-1126-7.patch, HIVE-1126.patch, HIVE-1126_patch(0.5.0_source).patch I've been using the hive jdbc driver more and more and was missing some functionality which I added HiveDatabaseMetaData.getTables Using show tables to get the info from hive. HiveDatabaseMetaData.getColumns Using describe tablename to get the columns. This makes using something like SQuirreL a lot nicer since you have the list of tables and just click on the content tab to see what's in the table. I also implemented HiveResultSet.getObject(String columnName) so you call most get* methods based on the column name. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1378) Return value for map, array, and struct needs to return a string
[ https://issues.apache.org/jira/browse/HIVE-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-1378: - Component/s: (was: Drivers) JDBC Return value for map, array, and struct needs to return a string - Key: HIVE-1378 URL: https://issues.apache.org/jira/browse/HIVE-1378 Project: Hive Issue Type: Improvement Components: JDBC Reporter: Jerome Boulon Assignee: Steven Wong Fix For: 0.7.0 Attachments: HIVE-1378.1.patch, HIVE-1378.2.patch, HIVE-1378.3.patch, HIVE-1378.4.patch, HIVE-1378.5.patch, HIVE-1378.6.patch, HIVE-1378.7.patch, HIVE-1378.patch In order to be able to select/display any data from JDBC Hive driver, return value for map, array, and struct needs to return a string -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1380) JDBC connection to be able to reattach to same session
[ https://issues.apache.org/jira/browse/HIVE-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-1380: - Component/s: (was: Drivers) JDBC JDBC connection to be able to reattach to same session -- Key: HIVE-1380 URL: https://issues.apache.org/jira/browse/HIVE-1380 Project: Hive Issue Type: Improvement Components: JDBC Affects Versions: 0.5.0 Reporter: Jerome Boulon -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1688) In the MapJoinOperator, the code uses tag as alias, which is not always true
[ https://issues.apache.org/jira/browse/HIVE-1688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-1688: - Component/s: (was: Drivers) Query Processor In the MapJoinOperator, the code uses tag as alias, which is not always true Key: HIVE-1688 URL: https://issues.apache.org/jira/browse/HIVE-1688 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.6.0, 0.7.0 Reporter: Liyin Tang Assignee: Liyin Tang Fix For: 0.7.0 Original Estimate: 24h Remaining Estimate: 24h In the MapJoinOperator and SMBMapJoinOperator, the code uses tag as alias, which is not always true. Actually, alias = order[tag] -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1815) The class HiveResultSet should implement batch fetching.
[ https://issues.apache.org/jira/browse/HIVE-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-1815: - Component/s: (was: Drivers) JDBC The class HiveResultSet should implement batch fetching. Key: HIVE-1815 URL: https://issues.apache.org/jira/browse/HIVE-1815 Project: Hive Issue Type: Improvement Components: JDBC Affects Versions: 0.5.0 Environment: Custom Java application using the Hive JDBC driver to connect to a Hive server, execute a Hive query and process the results. Reporter: Guy le Mar Fix For: 0.6.0 When using the Hive JDBC driver, you can execute a Hive query and obtain a HiveResultSet instance that contains the results of the query. Unfortunately, HiveResultSet can then only fetch a single row of these results from the Hive server at a time. As a consequence, it's extremely slow to fetch a resultset of anything other than a trivial size. It would be nice for the HiveResultSet to be able to fetch N rows from the server at a time, so that performance is suitable to support applications that provide human interaction. (From memory, I think it took me around 20 minutes to fetch 4000 rows.) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1816) Reporting of (seemingly inconsequential) transport exception has major impact on performance
[ https://issues.apache.org/jira/browse/HIVE-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-1816: - Component/s: (was: Drivers) JDBC Reporting of (seemingly inconsequential) transport exception has major impact on performance Key: HIVE-1816 URL: https://issues.apache.org/jira/browse/HIVE-1816 Project: Hive Issue Type: Improvement Components: JDBC Affects Versions: 0.5.0 Environment: Custom Java application using the Hive JDBC driver to connect to a Hive server, execute a Hive query and process the results. Reporter: Guy le Mar Priority: Minor During the process of executing a hive query and then fetching the results, the following stack track is continually output to seterr. For the query I executed, 47Mb of this text was generated. As a consequence, the performance of the application itself suffered. (Redirecting stderr to file halved the time it took my application to fetch the results - from 2 minutes down to 70 sec.) Note, this also occurs if you use an application such as SQuirrel SQL (http://www.squirrelsql.org) to execute a Hive query using the Hive JDBC driver. The stack trace that is repeatedly reported is... org.apache.thrift.transport.TTransportException at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) at org.apache.hadoop.hive.serde2.thrift.TCTLSeparatedProtocol$SimpleTransportTokenizer.fillTokenizer(TCTLSeparatedProtocol.java:215) at org.apache.hadoop.hive.serde2.thrift.TCTLSeparatedProtocol$SimpleTransportTokenizer.init(TCTLSeparatedProtocol.java:210) at org.apache.hadoop.hive.serde2.thrift.TCTLSeparatedProtocol.internalInitialize(TCTLSeparatedProtocol.java:336) at org.apache.hadoop.hive.serde2.thrift.TCTLSeparatedProtocol.initialize(TCTLSeparatedProtocol.java:417) at org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe.initialize(DynamicSerDe.java:94) at org.apache.hadoop.hive.jdbc.HiveResultSet.initDynamicSerde(HiveResultSet.java:117) at org.apache.hadoop.hive.jdbc.HiveResultSet.init(HiveResultSet.java:74) at org.apache.hadoop.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:178) at com.quest.orahive.HiveJdbcClient.main(HiveJdbcClient.java:117) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-143) Remove the old Metastore
[ https://issues.apache.org/jira/browse/HIVE-143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-143: Affects Version/s: (was: 0.6.0) (was: 0.4.0) Remove the old Metastore Key: HIVE-143 URL: https://issues.apache.org/jira/browse/HIVE-143 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 0.3.0 Reporter: Johan Oskarsson Assignee: Prasad Chakka Priority: Minor Fix For: 0.4.0 Attachments: hive-143.patch It is my understanding that there are two metastores, one HDFS based that isn't being used anymore and one new based on SQL databases. This causes some confusion and extra work for new developers, myself included. Am I correct in thinking that the old Metastore won't be used anymore and could be removed? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1879) Remove hive.metastore.metadb.dir property from hive-default.xml and HiveConf
Remove hive.metastore.metadb.dir property from hive-default.xml and HiveConf Key: HIVE-1879 URL: https://issues.apache.org/jira/browse/HIVE-1879 Project: Hive Issue Type: Bug Components: Configuration, Metastore Reporter: Carl Steinbach Assignee: Carl Steinbach The file-based MetaStore implementation was removed in HIVE-143. We also need to remove the hive.metastore.metadb.dir property from hive-default.xml and HiveConf, as well as the references to this property that currently appear in HiveMetaStoreClient. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1643) support range scans and non-key columns in HBase filter pushdown
[ https://issues.apache.org/jira/browse/HIVE-1643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12977472#action_12977472 ] John Sichi commented on HIVE-1643: -- Notes for working on this: Background is in http://wiki.apache.org/hadoop/Hive/FilterPushdownDev * In HiveHBaseTableInputFormat, newIndexPredicateAnalyzer needs to add additional operators (and stop restricting the allowed column names). And then convertFilter needs to set up corresponding HBase filter conditions based on the predicates it finds. Note that for inequality conditions on the key, it's necessary to muck with startRow/stopRow (not just the filter evaluator). * See also the comment in HBaseStorageHandler.decomposePredicate. Currently, it can only accept a single predicate. If you want to be able to support AND of multiple predicates (using HBase's FilterList) then this will need to be relaxed. * Beware of the fact that until HIVE-1538 gets committed, it is more difficult to make sure that the HBase-level filtering is working as expected. The reason is that without HIVE-1538, a second copy of the filter gets applied within Hive (regardless of how the filter was decomposed when it was pushed down to HBase). So even if HBase doesn't filter out everything you're expecting it to, you won't notice in the results since Hive will do the filtering again. support range scans and non-key columns in HBase filter pushdown Key: HIVE-1643 URL: https://issues.apache.org/jira/browse/HIVE-1643 Project: Hive Issue Type: Improvement Components: HBase Handler Affects Versions: 0.7.0 Reporter: John Sichi Assignee: John Sichi Fix For: 0.7.0 HIVE-1226 added support for WHERE rowkey=3. We would like to support WHERE rowkey BETWEEN 10 and 20, as well as predicates on non-rowkeys (plus conjunctions etc). Non-rowkey conditions can't be used to filter out entire ranges, but they can be used to push the per-row filter processing as far down as possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1880) Hive should verify that entries in hive.metastore.uris are Thrift URIs
Hive should verify that entries in hive.metastore.uris are Thrift URIs -- Key: HIVE-1880 URL: https://issues.apache.org/jira/browse/HIVE-1880 Project: Hive Issue Type: Bug Components: Configuration, Metastore Reporter: Carl Steinbach Assignee: Carl Steinbach The hive.metastore.uris configuration property contains a list of Thrift URLs for remote Thrift metastores. These values are used if the user has specified a non-local metastore configuration by setting hive.metastore.local=true. HiveMetaStoreClient.openStore(URI) currently makes the assumption that the URI is a Thrift Binary Protocol endpoint. We should first check to make sure that the scheme of the URI is thrift before attempting to open a Thrift binary connection to the host and port specified in the URI. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1880) Hive should verify that entries in hive.metastore.uris are Thrift URIs
[ https://issues.apache.org/jira/browse/HIVE-1880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12977475#action_12977475 ] Carl Steinbach commented on HIVE-1880: -- It would also probably be useful to log a warning message if the configuration has hive.metastore.local=true and values set for the hive.metastore.uris since in this case the value of the latter property is ignored. Hive should verify that entries in hive.metastore.uris are Thrift URIs -- Key: HIVE-1880 URL: https://issues.apache.org/jira/browse/HIVE-1880 Project: Hive Issue Type: Bug Components: Configuration, Metastore Reporter: Carl Steinbach Assignee: Carl Steinbach The hive.metastore.uris configuration property contains a list of Thrift URLs for remote Thrift metastores. These values are used if the user has specified a non-local metastore configuration by setting hive.metastore.local=true. HiveMetaStoreClient.openStore(URI) currently makes the assumption that the URI is a Thrift Binary Protocol endpoint. We should first check to make sure that the scheme of the URI is thrift before attempting to open a Thrift binary connection to the host and port specified in the URI. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
January Hive Contributors Meeting
Announcing a new Meetup for Hive Contributors Group! *What*: January Hive Contributors Meetinghttp://www.meetup.com/Hive-Contributors-Group/calendar/15919127/ *When*: Tuesday, January 11, 2011 5:00 PM *Where*: Cloudera 210 Portage Ave Palo Alto, CA 94306 The next Hive Contributors Meeting will convene on Tuesday January 11th at 5pm at Cloudera's offices in Palo Alto. Please RSVP if you plan to attend this event. RSVP to this Meetup: http://www.meetup.com/Hive-Contributors-Group/calendar/15919127/
[jira] Created: (HIVE-1882) Remove CHANGES.txt
Remove CHANGES.txt -- Key: HIVE-1882 URL: https://issues.apache.org/jira/browse/HIVE-1882 Project: Hive Issue Type: Task Components: Build Infrastructure Reporter: Carl Steinbach Assignee: Carl Steinbach I propose that we remove the CHANGES.txt file for the following reasons: * It's a headache to maintain. * It contains a lot of errors. * It's redundant since this information is available in JIRA and via source control. * The RELEASE_NOTES.txt file now contains the same information auto-generated by JIRA. We should update this file as part of the release process instead of updating CHANGES.txt on every commit. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Remove CHANGES.txt from Hive build
Hi, I just filed HIVE-1882 which proposes removing CHANGES.txt from the Hive build. I think the following points support this change: * It's a headache to maintain. * It contains a lot of errors. * It's redundant since this information is available in JIRA and via source control. * The RELEASE_NOTES.txt file contains the same information auto-generated using JIRA. We should update this file as part of the release process instead of updating CHANGES.txt on every commit. I plan to bring this up at the contrib meeting on Tuesday, but wanted to mention it in advance so people have a chance to think about it. Thanks. Carl
[jira] Updated: (HIVE-1881) Add an option to use FsShell to delete dir in warehouse
[ https://issues.apache.org/jira/browse/HIVE-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-1881: - Component/s: Metastore Description: @Yongqiang: What's the motivation for doing this? Add an option to use FsShell to delete dir in warehouse --- Key: HIVE-1881 URL: https://issues.apache.org/jira/browse/HIVE-1881 Project: Hive Issue Type: Improvement Components: Metastore Reporter: He Yongqiang Assignee: He Yongqiang Attachments: HIVE-1881.1.patch @Yongqiang: What's the motivation for doing this? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1881) Add an option to use FsShell to delete dir in warehouse
[ https://issues.apache.org/jira/browse/HIVE-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12977555#action_12977555 ] Carl Steinbach commented on HIVE-1881: -- Review posted here: https://reviews.apache.org/r/210/ Add an option to use FsShell to delete dir in warehouse --- Key: HIVE-1881 URL: https://issues.apache.org/jira/browse/HIVE-1881 Project: Hive Issue Type: Improvement Components: Metastore Reporter: He Yongqiang Assignee: He Yongqiang Attachments: HIVE-1881.1.patch @Yongqiang: What's the motivation for doing this? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Review Request: HIVE-1881: Add an option to use FsShell to delete dir in warehouse
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/210/#review84 --- http://svn.apache.org/repos/asf/hive/trunk/common/src/java/org/apache/hadoop/hive/conf/SessionConfStore.java https://reviews.apache.org/r/210/#comment154 I think this should be HiveConf instead of Configuration. http://svn.apache.org/repos/asf/hive/trunk/common/src/java/org/apache/hadoop/hive/conf/SessionConfStore.java https://reviews.apache.org/r/210/#comment153 I think you can simplify the interface by making getSessionConfStore() private, and then calling it from getConf() and setConf() which can now be made static. Then you'll be able to call SessionConfStore.getConf() instead of SessionConfStore.getSessionConfStore().getConf() http://svn.apache.org/repos/asf/hive/trunk/conf/hive-default.xml https://reviews.apache.org/r/210/#comment155 I don't understand the motivation for this change, but assuming that FsShell provides features unavailable in FileSystem, is there any reason why we can't replace the FileSystem based implementation with the new one that uses FsShell? - Carl On 2011-01-04 16:41:46, Carl Steinbach wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/210/ --- (Updated 2011-01-04 16:41:46) Review request for hive. Summary --- Review https://issues.apache.org/jira/secure/attachment/12467491/HIVE-1881.1.patch This addresses bug HIVE-1881. https://issues.apache.org/jira/browse/HIVE-1881 Diffs - http://svn.apache.org/repos/asf/hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1055171 http://svn.apache.org/repos/asf/hive/trunk/common/src/java/org/apache/hadoop/hive/conf/SessionConfStore.java PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/conf/hive-default.xml 1055171 http://svn.apache.org/repos/asf/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java 1055171 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java 1055171 Diff: https://reviews.apache.org/r/210/diff Testing --- Thanks, Carl
[jira] Updated: (HIVE-1862) Revive partition filtering in the Hive MetaStore
[ https://issues.apache.org/jira/browse/HIVE-1862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mac Yang updated HIVE-1862: --- Status: Patch Available (was: Open) Revive partition filtering in the Hive MetaStore Key: HIVE-1862 URL: https://issues.apache.org/jira/browse/HIVE-1862 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.7.0 Reporter: Devaraj Das Fix For: 0.7.0 Attachments: HIVE-1862.1.patch.txt HIVE-1853 downgraded the JDO version. This makes the feature of partition filtering in the metastore unusable. This jira is to keep track of the lost feature and discussing approaches to bring it back. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1862) Revive partition filtering in the Hive MetaStore
[ https://issues.apache.org/jira/browse/HIVE-1862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mac Yang updated HIVE-1862: --- Attachment: HIVE-1862.1.patch.txt Datanucleus 2.0.3 does not support the get() method on Collection, which the partition filtering code depends on in order to retrieve the value for a particular partition and use it for filtering. The submitted patch is a quick work around. It uses the substring() function to extract the partition value out of the partitionName field, and thus eliminates the need of the get() method. However, this approach does not work if the partition value contains special characters. This is because the partitionName has the special characters escaped. Hence the partition value generated using the substring() approach is also in the escaped form. Here is the list of special characters for reference purpose, '', '#', '%', '\'', '*', '/', ':', '=', '?', '\\', '\u007F', '{', ']' While this solution is incomplete, I am hoping this submission will trigger more suggestions and ideas. Revive partition filtering in the Hive MetaStore Key: HIVE-1862 URL: https://issues.apache.org/jira/browse/HIVE-1862 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.7.0 Reporter: Devaraj Das Fix For: 0.7.0 Attachments: HIVE-1862.1.patch.txt HIVE-1853 downgraded the JDO version. This makes the feature of partition filtering in the metastore unusable. This jira is to keep track of the lost feature and discussing approaches to bring it back. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1862) Revive partition filtering in the Hive MetaStore
[ https://issues.apache.org/jira/browse/HIVE-1862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12977583#action_12977583 ] Mac Yang commented on HIVE-1862: A quick note about the patch. The work around is implemented in ExpressionTree.java. The rest of the patch just added back old code that was removed as part of HIVE-1853. Revive partition filtering in the Hive MetaStore Key: HIVE-1862 URL: https://issues.apache.org/jira/browse/HIVE-1862 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.7.0 Reporter: Devaraj Das Fix For: 0.7.0 Attachments: HIVE-1862.1.patch.txt HIVE-1853 downgraded the JDO version. This makes the feature of partition filtering in the metastore unusable. This jira is to keep track of the lost feature and discussing approaches to bring it back. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1881) Add an option to use FsShell to delete dir in warehouse
[ https://issues.apache.org/jira/browse/HIVE-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Yongqiang updated HIVE-1881: --- Description: @Yongqiang: What's the motivation for doing this? This is to work with some internal hacky codes about doing delete. There should be no impact if you use open source hadoop. But the idea here is to give users 2 options to do the delete. In Facebook, we have some customized code in FsShell which can control whether the delete should go through trash or not. was:@Yongqiang: What's the motivation for doing this? Add an option to use FsShell to delete dir in warehouse --- Key: HIVE-1881 URL: https://issues.apache.org/jira/browse/HIVE-1881 Project: Hive Issue Type: Improvement Components: Metastore Reporter: He Yongqiang Assignee: He Yongqiang Attachments: HIVE-1881.1.patch @Yongqiang: What's the motivation for doing this? This is to work with some internal hacky codes about doing delete. There should be no impact if you use open source hadoop. But the idea here is to give users 2 options to do the delete. In Facebook, we have some customized code in FsShell which can control whether the delete should go through trash or not. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1151) Add 'show version' command to Hive CLI
[ https://issues.apache.org/jira/browse/HIVE-1151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-1151: - Component/s: CLI Add 'show version' command to Hive CLI -- Key: HIVE-1151 URL: https://issues.apache.org/jira/browse/HIVE-1151 Project: Hive Issue Type: New Feature Components: CLI, Clients Affects Versions: 0.6.0 Reporter: Carl Steinbach Assignee: Carl Steinbach At a minimum this command should return the version information obtained from the hive-cli jar. Ideally this command will also return version information obtained from each of the hive jar files present in the CLASSPATH, which will allow us to quickly detect cases where people are using incompatible jars. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1881) Add an option to use FsShell to delete dir in warehouse
[ https://issues.apache.org/jira/browse/HIVE-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12977591#action_12977591 ] He Yongqiang commented on HIVE-1881: {quote} I don't understand the motivation for this change, but assuming that FsShell provides features unavailable in FileSystem, is there any reason why we can't replace the FileSystem based implementation with the new one that uses FsShell? {quote} Yeah, we can replace it completely. But there is an overhead of using FsShell since it requires a new process. We just want to go to the new code path only needed. For normal ones, just keep the old behavior. {quote} I think you can simplify the interface by making getSessionConfStore() private, and then calling it from getConf() and setConf() which can now be made static. Then you'll be able to call SessionConfStore.getConf() instead of SessionConfStore.getSessionConfStore().getConf() {quote} will do it and upload a new patch. Thanks! Add an option to use FsShell to delete dir in warehouse --- Key: HIVE-1881 URL: https://issues.apache.org/jira/browse/HIVE-1881 Project: Hive Issue Type: Improvement Components: Metastore Reporter: He Yongqiang Assignee: He Yongqiang Attachments: HIVE-1881.1.patch @Yongqiang: What's the motivation for doing this? This is to work with some internal hacky codes about doing delete. There should be no impact if you use open source hadoop. But the idea here is to give users 2 options to do the delete. In Facebook, we have some customized code in FsShell which can control whether the delete should go through trash or not. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1881) Add an option to use FsShell to delete dir in warehouse
[ https://issues.apache.org/jira/browse/HIVE-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12977594#action_12977594 ] Carl Steinbach commented on HIVE-1881: -- I'm concerned that this patch introduces two new configuration properties that don't make sense to anyone outside of Facebook. I think we need to avoid doing this since it makes the configuration process more complicated (it's already complicated enough), and also introduces an untested code path. Instead, I'd like to propose that we define a MetaStoreFs interface that defines createDir and deleteDir methods, etc, along with a default implementation and the ability to plug in other implementations by setting a new hive.metastore.fs.impl configuration property. What do you think? Add an option to use FsShell to delete dir in warehouse --- Key: HIVE-1881 URL: https://issues.apache.org/jira/browse/HIVE-1881 Project: Hive Issue Type: Improvement Components: Metastore Reporter: He Yongqiang Assignee: He Yongqiang Attachments: HIVE-1881.1.patch @Yongqiang: What's the motivation for doing this? This is to work with some internal hacky codes about doing delete. There should be no impact if you use open source hadoop. But the idea here is to give users 2 options to do the delete. In Facebook, we have some customized code in FsShell which can control whether the delete should go through trash or not. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1881) Add an option to use FsShell to delete dir in warehouse
[ https://issues.apache.org/jira/browse/HIVE-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Yongqiang updated HIVE-1881: --- Attachment: HIVE-1881.2.patch Add an option to use FsShell to delete dir in warehouse --- Key: HIVE-1881 URL: https://issues.apache.org/jira/browse/HIVE-1881 Project: Hive Issue Type: Improvement Components: Metastore Reporter: He Yongqiang Assignee: He Yongqiang Attachments: HIVE-1881.1.patch, HIVE-1881.2.patch @Yongqiang: What's the motivation for doing this? This is to work with some internal hacky codes about doing delete. There should be no impact if you use open source hadoop. But the idea here is to give users 2 options to do the delete. In Facebook, we have some customized code in FsShell which can control whether the delete should go through trash or not. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
RE: Regarding Hive History File(s).
hmm, ok , I think the process of creating and cleanup of resources should be the part of the same system, lets not hand it over to cron utility, users might not be knowing or need not to know what files to delete, when to delete, from where to delete. What about a timer task which cleans up these files older than the configured elapsed time say a deleting files an hour old or a week old.? I'm raising new JIRA for this and will provide the patch. Ok, you are talking about HIVE-1708, WELL If it is about changing the file location, one can do that by overriding the property hive.querylog.location by adding into hive-default.xml. I will comment on that. -Mohit *** This e-mail and attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient's) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it! -Original Message- From: Edward Capriolo [mailto:edlinuxg...@gmail.com] Sent: Tuesday, January 04, 2011 8:03 PM To: mohitsi...@huawei.com Cc: hive-...@hadoop.apache.org; c...@cloudera.com Subject: Re: Regarding Hive History File(s). On Tue, Jan 4, 2011 at 7:03 AM, Mohit mohitsi...@huawei.com wrote: Hello All, What is the purpose of maintaining hive history files which contain session information like session start, query start, query end, task start, task end etc.? Are they being used later (say by a tool) for some purpose? I don't see these files being getting deleted from the system ;any configuration needed to be set to enable deletion or Is there any design strategy/decision/rationale for not deleting them at all? Also, in these files I don't see the session end message being logged, is it reserved for future use? -Mohit *** This e-mail and attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient's) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it! HiveHistory was added a while ago between 3.0 and 4.0 (iirc). A tool to view them is HiveHistoryViewer in the API. I am not exactly sure who is doing what with that data. The Web Interface does use it to provide links to the JobTracker. So it helpful for trying to trace all the dependant jobs of a query after the fact. There is a ticket open to customize the file location. I was also thinking we should allow the user to supply a 'none' to turn off the feature. As for clean up and management cron and rm seem like a good fit.
[jira] Commented: (HIVE-1708) make hive history file configurable
[ https://issues.apache.org/jira/browse/HIVE-1708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12977633#action_12977633 ] Mohit Sikri commented on HIVE-1708: --- well it's not the case I observed, I added the below property property namehive.querylog.location/name value/tmp/tansactionhist/value descriptionLocation for the hive query log. Default value is /tmp/${user.name}/description /property into hive-default.xml , it is creating files under /tmp/transactionhist directory. Kindly confirm once. make hive history file configurable --- Key: HIVE-1708 URL: https://issues.apache.org/jira/browse/HIVE-1708 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Namit Jain Currentlly, it is derived from System.getProperty(user.home)/.hivehistory; -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1883) Periodic cleanup of Hive History log files.
Periodic cleanup of Hive History log files. --- Key: HIVE-1883 URL: https://issues.apache.org/jira/browse/HIVE-1883 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.6.0 Environment: Hive 0.6.0, Hadoop 0.20.1 SUSE Linux Enterprise Server 11 (i586) VERSION = 11 PATCHLEVEL = 0 Reporter: Mohit Sikri After starting hive and running queries transaction history files are getting creating in the /tmp/root folder. These files we should remove periodically(not all of them but) which are too old to represent any significant information. Solution :- A scheduled timer task, which cleans up the log files older than the configured time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1840) Support ALTER DATABASE to change database properties
[ https://issues.apache.org/jira/browse/HIVE-1840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1840: - Attachment: HIVE-1840.patch Support ALTER DATABASE to change database properties Key: HIVE-1840 URL: https://issues.apache.org/jira/browse/HIVE-1840 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Namit Jain Assignee: Ning Zhang Attachments: HIVE-1840.patch This is a follow-up to HIVE-1836 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1840) Support ALTER DATABASE to change database properties
[ https://issues.apache.org/jira/browse/HIVE-1840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1840: - Status: Patch Available (was: Open) Support ALTER DATABASE to change database properties Key: HIVE-1840 URL: https://issues.apache.org/jira/browse/HIVE-1840 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Namit Jain Assignee: Ning Zhang Attachments: HIVE-1840.patch This is a follow-up to HIVE-1836 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Regarding Hive History File(s).
Hi Mohit, Usually it's the Ops/IT staff that ends up managing things like a production HiveServer instance, and in a UNIX shop I suspect that most of these folks are already going to be familiar with using cron and logrotate ( http://linuxcommand.org/man_pages/logrotate8.html) to manage the logs produced by their other server systems. Building a log rotation feature into HiveServer defies this convention and will force people to learn how to configure a new log rotation system specific to HiveServer. It also requires us to write, debug, document and maintain code that isn't really necessary. I think the best approach is to take advantage of what already exists by documenting Hive's logging behavior in the Admin manual and providing a sample logrotate configuration file. Thanks. Carl On Tue, Jan 4, 2011 at 9:41 PM, Mohit mohitsi...@huawei.com wrote: hmm, ok , I think the process of creating and cleanup of resources should be the part of the same system, lets not hand it over to cron utility, users might not be knowing or need not to know what files to delete, when to delete, from where to delete. What about a timer task which cleans up these files older than the configured elapsed time say a deleting files an hour old or a week old.? I'm raising new JIRA for this and will provide the patch. Ok, you are talking about HIVE-1708, WELL If it is about changing the file location, one can do that by overriding the property *hive.querylog.location *by adding into hive-default.xml. I will comment on that. -Mohit *** This e-mail and attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient's) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it! -Original Message- From: Edward Capriolo [mailto:edlinuxg...@gmail.com] Sent: Tuesday, January 04, 2011 8:03 PM To: mohitsi...@huawei.com Cc: hive-...@hadoop.apache.org; c...@cloudera.com Subject: Re: Regarding Hive History File(s). On Tue, Jan 4, 2011 at 7:03 AM, Mohit mohitsi...@huawei.com wrote: Hello All, What is the purpose of maintaining hive history files which contain session information like session start, query start, query end, task start, task end etc.? Are they being used later (say by a tool) for some purpose? I don't see these files being getting deleted from the system ;any configuration needed to be set to enable deletion or Is there any design strategy/decision/rationale for not deleting them at all? Also, in these files I don't see the session end message being logged, is it reserved for future use? -Mohit *** This e-mail and attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient's) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it! HiveHistory was added a while ago between 3.0 and 4.0 (iirc). A tool to view them is HiveHistoryViewer in the API. I am not exactly sure who is doing what with that data. The Web Interface does use it to provide links to the JobTracker. So it helpful for trying to trace all the dependant jobs of a query after the fact. There is a ticket open to customize the file location. I was also thinking we should allow the user to supply a 'none' to turn off the feature. As for clean up and management cron and rm seem like a good fit.
[jira] Commented: (HIVE-1881) Add an option to use FsShell to delete dir in warehouse
[ https://issues.apache.org/jira/browse/HIVE-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12977666#action_12977666 ] Namit Jain commented on HIVE-1881: -- Talking offline with Yongqiang, the facebook specific implementation of this interface need not be checked in open source, nor is there any need to document the new configuration parameter in open source, since this parameter only makes sense in the facebook enviroment Add an option to use FsShell to delete dir in warehouse --- Key: HIVE-1881 URL: https://issues.apache.org/jira/browse/HIVE-1881 Project: Hive Issue Type: Improvement Components: Metastore Reporter: He Yongqiang Assignee: He Yongqiang Attachments: HIVE-1881.1.patch, HIVE-1881.2.patch @Yongqiang: What's the motivation for doing this? This is to work with some internal hacky codes about doing delete. There should be no impact if you use open source hadoop. But the idea here is to give users 2 options to do the delete. In Facebook, we have some customized code in FsShell which can control whether the delete should go through trash or not. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.