[jira] Commented: (HIVE-83) Set up a continuous build of Hive with Hudson
[ https://issues.apache.org/jira/browse/HIVE-83?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12672739#action_12672739 ] dhruba borthakur commented on HIVE-83: -- I have got an hudson account named dhruba. However, this account will be used by Johan to setup the Hive-Hudson builds. Set up a continuous build of Hive with Hudson - Key: HIVE-83 URL: https://issues.apache.org/jira/browse/HIVE-83 Project: Hadoop Hive Issue Type: Task Components: Build Infrastructure Reporter: Jeff Hammerbacher Other projects like Zookeeper and HBase are leveraging Apache's hosted Hudson server (http://hudson.zones.apache.org/hudson/view/HBase). Perhaps Hive should as well? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-276) input3_limit.q fails under 0.17
[ https://issues.apache.org/jira/browse/HIVE-276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HIVE-276: Attachment: HIVE-276.1.patch Modified the query to do another SORT at the end. I also thought about propagating the sort order to the reduce sink operator of limit, but that does not seem very easy to do. input3_limit.q fails under 0.17 --- Key: HIVE-276 URL: https://issues.apache.org/jira/browse/HIVE-276 Project: Hadoop Hive Issue Type: Bug Reporter: Zheng Shao Attachments: HIVE-276.1.patch The plan ql/src/test/results/clientpositive/input3_limit.q.out shows that there are 2 map-reduce jobs: The first one is distributed and sorted as is specified by the query. The reducer side has LIMIT 20. The second one (single reducer job imposed by LIMIT 20) does not have the same sort order, so the final result is non-deterministic. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-278) Add HiveHistory to Hive web interface
[ https://issues.apache.org/jira/browse/HIVE-278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12672744#action_12672744 ] Ashish Thusoo commented on HIVE-278: Looked at this with Suresh. We feel that the Session object is being created in the SessionManager where as the Driver is created in the SessionItem thread. As a result the Driver looks at it thread specific state and does not find a Session there and therefore does not put anything in the log. Suresh pointed this out and can chime in to clarify this. But basically we would have to create the session object and the driver in the same thread. Maybe moving the Driver creation to the SessionManager thread will fix this. Add HiveHistory to Hive web interface - Key: HIVE-278 URL: https://issues.apache.org/jira/browse/HIVE-278 Project: Hadoop Hive Issue Type: New Feature Components: Logging, Web UI Affects Versions: 0.2.0 Reporter: Edward Capriolo Assignee: Edward Capriolo Priority: Minor Fix For: 0.2.0 Attachments: session_logging.diff In order for HIVE-176 to be utilized by the Hive web interface a few changes need to be made. * HWISessionItem needs a method with an argument signature {noformat} public HiveHistoryViewer getHistoryViewer() throws HWIException {noformat} * session_manage.jsp needs an addition {noformat} Hive History: a href=/hwi/session_history.jsp?sessionName=%=sessionName%%=sessionName%/abr {noformat} * session_history.jsp will have to be created to use the ql.history api -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-278) Add HiveHistory to Hive web interface
[ https://issues.apache.org/jira/browse/HIVE-278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Thusoo updated HIVE-278: --- Affects Version/s: (was: 0.2.0) 0.3.0 Fix Version/s: (was: 0.2.0) 0.3.0 update versions as HWI is not in 0.2. Add HiveHistory to Hive web interface - Key: HIVE-278 URL: https://issues.apache.org/jira/browse/HIVE-278 Project: Hadoop Hive Issue Type: New Feature Components: Logging, Web UI Affects Versions: 0.3.0 Reporter: Edward Capriolo Assignee: Edward Capriolo Priority: Minor Fix For: 0.3.0 Attachments: session_logging.diff In order for HIVE-176 to be utilized by the Hive web interface a few changes need to be made. * HWISessionItem needs a method with an argument signature {noformat} public HiveHistoryViewer getHistoryViewer() throws HWIException {noformat} * session_manage.jsp needs an addition {noformat} Hive History: a href=/hwi/session_history.jsp?sessionName=%=sessionName%%=sessionName%/abr {noformat} * session_history.jsp will have to be created to use the ql.history api -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-270) Add a lazy-deserialized SerDe for space and cpu efficient serialization of rows with primitive types
[ https://issues.apache.org/jira/browse/HIVE-270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12672764#action_12672764 ] Joydeep Sen Sarma commented on HIVE-270: looks pretty good! for the LazyString conversion - can use static function Text.decode. The text.append will do an unnecessary byte copy into the text's internal byte array. LazySimpleStructObjectInspector.java: public ListObject getStructFieldsDataAsList(Object data) { this is probably not used anywhere - but if data==null seems like we should just return null? (based on looking at other object inspectors) parse(): comparison for null sequence: the compare method is somewhat generic. since we only care for equality - can do a simple comparison of lengths first to find if things are unequal (should be a little faster) and do full comparison only if lengths are equal. it may be possible to speed up the serialize considerably as well (go directly from Primitive types to bytes and append to a bytebuffer) - but would make sense to punt on that. Add a lazy-deserialized SerDe for space and cpu efficient serialization of rows with primitive types Key: HIVE-270 URL: https://issues.apache.org/jira/browse/HIVE-270 Project: Hadoop Hive Issue Type: New Feature Components: Serializers/Deserializers Reporter: Zheng Shao Assignee: Zheng Shao Attachments: HIVE-270.1.patch, HIVE-270.3.patch We want to add a lazy-deserialized SerDe for space and cpu efficient serialization of rows with primitive types. This SerDe will share the same format as MetadataTypedColumnsetSerDe/TCTLSeparatedProtocol to be backward compatible. This SerDe will be used to replace the default table SerDe, and the SerDe used to communicate with user scripts. For simplicity, we don't plan to support nested structure with this SerDe. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Need LOCALTIMESTAMP ?
Hello, Please help me to understand what I am going to implement for Timestamp. Do we need LOCALTIMESTAMP implementation? See the comparisons below:: = LOCALTIMESTAMP It's often important to get the value of current date and time. Below are the functions used to do that in the different implementations. Standard The current timestamp (without time zone) is retrieved with the LOCALTIMESTAMP function which may be used as: SELECT LOCALTIMESTAMP ... or SELECT LOCALTIMESTAMP(precision) ... Note that SELECT LOCALTIMESTAMP() ... is illegal: If you don't care about the precision, then you must not use any parenthesis. If the DBMS supports the non-core time zone features (feature ID F411), then it must also provide the functions CURRENT_TIMESTAMP and CURRENT_TIMESTAMP(precision) which return a value of type TIMESTAMP WITH TIME ZONE. If it doesn't support time zones, then the DBMS must not provide a CURRENT_TIMESTAMP function. PostgreSQL Follows the standard. Documentation DB2 Doesn't have the LOCALTIMESTAMP function. Instead, it provides a special, magic value ('special register' in IBM language), CURRENT_TIMESTAMP (alias to 'CURRENT TIMESTAMP') which may be used as though it were a function without arguments. However, since DB2 doesn't provide TIMESTAMP WITH TIME ZONE support, the availability of CURRENT_TIMESTAMP could be said to be against the standard—at least confusing. Documentation MSSQL Doesn't have the LOCALTIMESTAMP function. Instead, it has CURRENT_TIMESTAMP which—however—doesn't return a value of TIMESTAMP WITH TIME ZONE, but rather a value of MSSQL's DATETIME type (which doesn't contain time zone information). Documentation MySQL Follows the standard. Documentation Oracle Follows the standard. Informix On my TODO. Thanks, shyam_sar...@yahoo.com
Datetime type in SQL standard
Following is the BNF for datetime type in SQL 2003:: datetime type::= DATE | TIME [ left paren time precision right paren ] [ with or without time zone ] | TIMESTAMP [ left paren timestamp precision right paren ] [ with or without time zone ] Please let me know if we implement the standard completely or not. Thanks, shyam_sar...@yahoo.com
RE: Need LOCALTIMESTAMP ?
Hi Shyam, I think HIVE-192 is about the fact that there is no support for the timestamp type in Hive (or for that matter date and datetime types). In FB we are using strings to hold this information. If you are planning to add a built in function like localtimestamp, then that should probably go into a different JIRA. We have tried to adhere to mysql way of doing things as we find more folks using it (at least in our company) and looks from your research that they are basically standards compliant. So my vote will be to go with mysql semantics and CURRENT_TIMESTAMP construct. Ashish -Original Message- From: Shyam Sarkar [mailto:shyam_sar...@yahoo.com] Sent: Wednesday, February 11, 2009 2:37 PM To: hive-dev@hadoop.apache.org Subject: Need LOCALTIMESTAMP ? Hello, Please help me to understand what I am going to implement for Timestamp. Do we need LOCALTIMESTAMP implementation? See the comparisons below:: = LOCALTIMESTAMP It's often important to get the value of current date and time. Below are the functions used to do that in the different implementations. Standard The current timestamp (without time zone) is retrieved with the LOCALTIMESTAMP function which may be used as: SELECT LOCALTIMESTAMP ... or SELECT LOCALTIMESTAMP(precision) ... Note that SELECT LOCALTIMESTAMP() ... is illegal: If you don't care about the precision, then you must not use any parenthesis. If the DBMS supports the non-core time zone features (feature ID F411), then it must also provide the functions CURRENT_TIMESTAMP and CURRENT_TIMESTAMP(precision) which return a value of type TIMESTAMP WITH TIME ZONE. If it doesn't support time zones, then the DBMS must not provide a CURRENT_TIMESTAMP function. PostgreSQL Follows the standard. Documentation DB2 Doesn't have the LOCALTIMESTAMP function. Instead, it provides a special, magic value ('special register' in IBM language), CURRENT_TIMESTAMP (alias to 'CURRENT TIMESTAMP') which may be used as though it were a function without arguments. However, since DB2 doesn't provide TIMESTAMP WITH TIME ZONE support, the availability of CURRENT_TIMESTAMP could be said to be against the standard-at least confusing. Documentation MSSQL Doesn't have the LOCALTIMESTAMP function. Instead, it has CURRENT_TIMESTAMP which-however-doesn't return a value of TIMESTAMP WITH TIME ZONE, but rather a value of MSSQL's DATETIME type (which doesn't contain time zone information). Documentation MySQL Follows the standard. Documentation Oracle Follows the standard. Informix On my TODO. Thanks, shyam_sar...@yahoo.com
timestamp examples in standard SQL
Some examples with timestamp in SQL standard :: == Create Table CREATE TABLE Stu_Table ( Stu_Id varchar(2), Stu_Name varchar(10), Stu_Dob timestamp NOT NULL ); Insert Date Into Stu_Table Now insert into statement is used to add the records or rows into a table 'Stu_Table'. Insert Into Stu_Table Values('1', 'Komal', '1984-10-27'); Insert Into Stu_Table Values('2', 'ajay', '1985-04-19'); Insert Into Stu_Table Values('3', 'Santosh', '1986-11-16'); Stu_Table ++--+-+ | Stu_Id | Stu_Name | Stu_Dob | ++--+-+ | 1 | Komal| 1984-10-27 00:00:00 | | 2 | ajay | 1985-04-19 00:00:00 | | 3 | Santosh | 1986-11-16 00:00:00 | ++--+-+ Query The given below query return you the list of records enlisted in the select statement. The Where clause restrict the select query and return you the records from stu_Dob column between '1984-01-01' And '1986-1-1'. Select * From Stu_Table Where Stu_Dob Between '1984-01-01' And '1986-1-1'; Result ++--+-+ | Stu_Id | Stu_Name | Stu_Dob | ++--+-+ | 1 | Komal| 1984-10-27 00:00:00 | | 2 | ajay | 1985-04-19 00:00:00 | ++--+-+
RE: Need LOCALTIMESTAMP ?
Hi Ashish, Read about the latest TIMESTAMP implementation in MySQL 5.0 version and suggest :: http://dev.mysql.com/doc/refman/5.0/en/timestamp.html Also please comment on the following MySQL 5.0 implementation semantics:: TIMESTAMP values are converted from the current time zone to UTC for storage, and converted back from UTC to the current time zone for retrieval. (This occurs only for the TIMESTAMP data type, not for other types such as DATETIME.) By default, the current time zone for each connection is the server's time. Should we do the same thing? Thanks, shyam_sar...@yahoo.com --- On Wed, 2/11/09, Ashish Thusoo athu...@facebook.com wrote: From: Ashish Thusoo athu...@facebook.com Subject: RE: Need LOCALTIMESTAMP ? To: hive-dev@hadoop.apache.org hive-dev@hadoop.apache.org, shyam_sar...@yahoo.com shyam_sar...@yahoo.com Date: Wednesday, February 11, 2009, 2:55 PM Hi Shyam, I think HIVE-192 is about the fact that there is no support for the timestamp type in Hive (or for that matter date and datetime types). In FB we are using strings to hold this information. If you are planning to add a built in function like localtimestamp, then that should probably go into a different JIRA. We have tried to adhere to mysql way of doing things as we find more folks using it (at least in our company) and looks from your research that they are basically standards compliant. So my vote will be to go with mysql semantics and CURRENT_TIMESTAMP construct. Ashish -Original Message- From: Shyam Sarkar [mailto:shyam_sar...@yahoo.com] Sent: Wednesday, February 11, 2009 2:37 PM To: hive-dev@hadoop.apache.org Subject: Need LOCALTIMESTAMP ? Hello, Please help me to understand what I am going to implement for Timestamp. Do we need LOCALTIMESTAMP implementation? See the comparisons below:: = LOCALTIMESTAMP It's often important to get the value of current date and time. Below are the functions used to do that in the different implementations. Standard The current timestamp (without time zone) is retrieved with the LOCALTIMESTAMP function which may be used as: SELECT LOCALTIMESTAMP ... or SELECT LOCALTIMESTAMP(precision) ... Note that SELECT LOCALTIMESTAMP() ... is illegal: If you don't care about the precision, then you must not use any parenthesis. If the DBMS supports the non-core time zone features (feature ID F411), then it must also provide the functions CURRENT_TIMESTAMP and CURRENT_TIMESTAMP(precision) which return a value of type TIMESTAMP WITH TIME ZONE. If it doesn't support time zones, then the DBMS must not provide a CURRENT_TIMESTAMP function. PostgreSQL Follows the standard. Documentation DB2 Doesn't have the LOCALTIMESTAMP function. Instead, it provides a special, magic value ('special register' in IBM language), CURRENT_TIMESTAMP (alias to 'CURRENT TIMESTAMP') which may be used as though it were a function without arguments. However, since DB2 doesn't provide TIMESTAMP WITH TIME ZONE support, the availability of CURRENT_TIMESTAMP could be said to be against the standard-at least confusing. Documentation MSSQL Doesn't have the LOCALTIMESTAMP function. Instead, it has CURRENT_TIMESTAMP which-however-doesn't return a value of TIMESTAMP WITH TIME ZONE, but rather a value of MSSQL's DATETIME type (which doesn't contain time zone information). Documentation MySQL Follows the standard. Documentation Oracle Follows the standard. Informix On my TODO. Thanks, shyam_sar...@yahoo.com
[jira] Updated: (HIVE-270) Add a lazy-deserialized SerDe for space and cpu efficient serialization of rows with primitive types
[ https://issues.apache.org/jira/browse/HIVE-270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HIVE-270: Attachment: HIVE-270.4.patch After looking at the code again, I decided to postpone the change of serialization until we move from String to Text. With that change, we will be able to get rid of all UTF-8 encoding in the serialization. This patch incorporates all other comments from Joydeep. Add a lazy-deserialized SerDe for space and cpu efficient serialization of rows with primitive types Key: HIVE-270 URL: https://issues.apache.org/jira/browse/HIVE-270 Project: Hadoop Hive Issue Type: New Feature Components: Serializers/Deserializers Reporter: Zheng Shao Assignee: Zheng Shao Attachments: HIVE-270.1.patch, HIVE-270.3.patch, HIVE-270.4.patch We want to add a lazy-deserialized SerDe for space and cpu efficient serialization of rows with primitive types. This SerDe will share the same format as MetadataTypedColumnsetSerDe/TCTLSeparatedProtocol to be backward compatible. This SerDe will be used to replace the default table SerDe, and the SerDe used to communicate with user scripts. For simplicity, we don't plan to support nested structure with this SerDe. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-270) Add a lazy-deserialized SerDe for space and cpu efficient serialization of rows with primitive types
[ https://issues.apache.org/jira/browse/HIVE-270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HIVE-270: Attachment: HIVE-270.5.patch Removed all test case changes (can be automatically generated by ant test -Doverwrite=true) Add a lazy-deserialized SerDe for space and cpu efficient serialization of rows with primitive types Key: HIVE-270 URL: https://issues.apache.org/jira/browse/HIVE-270 Project: Hadoop Hive Issue Type: New Feature Components: Serializers/Deserializers Reporter: Zheng Shao Assignee: Zheng Shao Attachments: HIVE-270.1.patch, HIVE-270.3.patch, HIVE-270.4.patch, HIVE-270.5.patch We want to add a lazy-deserialized SerDe for space and cpu efficient serialization of rows with primitive types. This SerDe will share the same format as MetadataTypedColumnsetSerDe/TCTLSeparatedProtocol to be backward compatible. This SerDe will be used to replace the default table SerDe, and the SerDe used to communicate with user scripts. For simplicity, we don't plan to support nested structure with this SerDe. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-131) insert overwrite directory leaves behind uncommitted/tmp files from failed tasks
[ https://issues.apache.org/jira/browse/HIVE-131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joydeep Sen Sarma updated HIVE-131: --- Attachment: hive-131.patch.2 Dhruba said: 1. I see that execute returns values 1, 2, and 3. It will be good to document what these values mean. 2. Staring hadoop 0.19, it might make sense to set FileSystem.deleteOnExit() for files that are temporary. 3. It is interesting to note that now there is an extra step jobClose() that gets triggered on the client-side after the job is complete. Prior to this patch, a job would be successful even if the client-side has disappeared before the job is completed. This patch requires that the client remains active and healthy till the entire job is complete. This probably is ok for Hive, especially because Hive anyway requires job-chaining and I do not see any other way to do it - incorporated suggestion to use deleteOnExit where available. - return codes are always accompanied by a corresponding message on the console/log. So don't see much point creating additional documentation around them. - hive has always depended on client side code-patch for query completion. insert overwrite directory leaves behind uncommitted/tmp files from failed tasks Key: HIVE-131 URL: https://issues.apache.org/jira/browse/HIVE-131 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Reporter: Joydeep Sen Sarma Assignee: Joydeep Sen Sarma Priority: Critical Attachments: HIVE-131.patch.1, hive-131.patch.2 _tmp files are getting left behind on insert overwrite directory: /user/jssarma/ctst1/40422_m_000195_0.deflate r 3 13285 2008-12-07 01:47 rw-r--r-- jssarma supergroup /user/jssarma/ctst1/40422_m_000196_0.deflate r 3 3055 2008-12-07 01:46 rw-r--r-- jssarma supergroup /user/jssarma/ctst1/_tmp.40422_m_33_0 r 3 0 2008-12-07 01:53 rw-r--r-- jssarma supergroup /user/jssarma/ctst1/_tmp.40422_m_37_1 r 3 0 2008-12-07 01:53 rw-r--r-- jssarma supergroup this happened with speculative execution. the code looks good (in fact in this case many speculative tasks were launched - and only a couple caused problems). Almost seems like these files did not appear in the namespace until after the map-reduce job finished and the movetask did a listing of the output dir .. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.