[jira] Commented: (HIVE-83) Set up a continuous build of Hive with Hudson

2009-02-11 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-83?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12672739#action_12672739
 ] 

dhruba borthakur commented on HIVE-83:
--

I have got an hudson account named dhruba. However, this account will be used 
by Johan to setup the Hive-Hudson builds. 

 Set up a continuous build of Hive with Hudson
 -

 Key: HIVE-83
 URL: https://issues.apache.org/jira/browse/HIVE-83
 Project: Hadoop Hive
  Issue Type: Task
  Components: Build Infrastructure
Reporter: Jeff Hammerbacher

 Other projects like Zookeeper and HBase are leveraging Apache's hosted Hudson 
 server (http://hudson.zones.apache.org/hudson/view/HBase). Perhaps Hive 
 should as well?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-276) input3_limit.q fails under 0.17

2009-02-11 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-276:


Attachment: HIVE-276.1.patch

Modified the query to do another SORT at the end.

I also thought about propagating the sort order to the reduce sink operator of 
limit, but that does not seem very easy to do.


 input3_limit.q fails under 0.17
 ---

 Key: HIVE-276
 URL: https://issues.apache.org/jira/browse/HIVE-276
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Zheng Shao
 Attachments: HIVE-276.1.patch


 The plan ql/src/test/results/clientpositive/input3_limit.q.out shows that 
 there are 2 map-reduce jobs:
 The first one is distributed and sorted as is specified by the query. The 
 reducer side has LIMIT 20.
 The second one (single reducer job imposed by LIMIT 20) does not have the 
 same sort order, so the final result is non-deterministic.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-278) Add HiveHistory to Hive web interface

2009-02-11 Thread Ashish Thusoo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12672744#action_12672744
 ] 

Ashish Thusoo commented on HIVE-278:


Looked at this with Suresh. We feel that the Session object is being created in 
the SessionManager where as the Driver is created in the SessionItem thread. As 
a result the Driver looks at it thread specific state and does not find a 
Session there and therefore does not put anything in the log. Suresh pointed 
this out and can chime in to clarify this.

But basically we would have to create the session object and the driver in the 
same thread. Maybe moving the Driver creation to the SessionManager thread will 
fix this.


 Add HiveHistory to Hive web interface
 -

 Key: HIVE-278
 URL: https://issues.apache.org/jira/browse/HIVE-278
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Logging, Web UI
Affects Versions: 0.2.0
Reporter: Edward Capriolo
Assignee: Edward Capriolo
Priority: Minor
 Fix For: 0.2.0

 Attachments: session_logging.diff


 In order for HIVE-176 to be utilized by the Hive web interface a few changes 
 need to be made.
 * HWISessionItem needs a method with an argument signature 
 {noformat} 
 public HiveHistoryViewer getHistoryViewer() throws HWIException
 {noformat} 
 * session_manage.jsp needs an addition
 {noformat} 
  Hive History: a 
 href=/hwi/session_history.jsp?sessionName=%=sessionName%%=sessionName%/abr
 {noformat} 
 * session_history.jsp will have to be created to use the ql.history api

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-278) Add HiveHistory to Hive web interface

2009-02-11 Thread Ashish Thusoo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Thusoo updated HIVE-278:
---

Affects Version/s: (was: 0.2.0)
   0.3.0
Fix Version/s: (was: 0.2.0)
   0.3.0

update versions as HWI is not in 0.2.

 Add HiveHistory to Hive web interface
 -

 Key: HIVE-278
 URL: https://issues.apache.org/jira/browse/HIVE-278
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Logging, Web UI
Affects Versions: 0.3.0
Reporter: Edward Capriolo
Assignee: Edward Capriolo
Priority: Minor
 Fix For: 0.3.0

 Attachments: session_logging.diff


 In order for HIVE-176 to be utilized by the Hive web interface a few changes 
 need to be made.
 * HWISessionItem needs a method with an argument signature 
 {noformat} 
 public HiveHistoryViewer getHistoryViewer() throws HWIException
 {noformat} 
 * session_manage.jsp needs an addition
 {noformat} 
  Hive History: a 
 href=/hwi/session_history.jsp?sessionName=%=sessionName%%=sessionName%/abr
 {noformat} 
 * session_history.jsp will have to be created to use the ql.history api

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-270) Add a lazy-deserialized SerDe for space and cpu efficient serialization of rows with primitive types

2009-02-11 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12672764#action_12672764
 ] 

Joydeep Sen Sarma commented on HIVE-270:


looks pretty good!

for the LazyString conversion - can use static function Text.decode. The 
text.append will do an unnecessary byte copy into the text's internal byte 
array.

LazySimpleStructObjectInspector.java:  public ListObject 
getStructFieldsDataAsList(Object data) {
this is probably not used anywhere - but if data==null seems like we should 
just return null? (based on looking at other object inspectors)

parse(): comparison for null sequence: the compare method is somewhat generic. 
since we only care for equality - can do a simple comparison of lengths first 
to find if things are unequal (should be a little faster) and do full 
comparison only if lengths are equal.

it may be possible to speed up the serialize considerably as well (go directly 
from Primitive types to bytes and append to a bytebuffer) - but would make 
sense to punt on that.

 Add a lazy-deserialized SerDe for space and cpu efficient serialization of 
 rows with primitive types
 

 Key: HIVE-270
 URL: https://issues.apache.org/jira/browse/HIVE-270
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Reporter: Zheng Shao
Assignee: Zheng Shao
 Attachments: HIVE-270.1.patch, HIVE-270.3.patch


 We want to add a lazy-deserialized SerDe for space and cpu efficient 
 serialization of rows with primitive types.
 This SerDe will share the same format as 
 MetadataTypedColumnsetSerDe/TCTLSeparatedProtocol to be backward compatible.
 This SerDe will be used to replace the default table SerDe, and the SerDe 
 used to communicate with user scripts.
 For simplicity, we don't plan to support nested structure with this SerDe.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Need LOCALTIMESTAMP ?

2009-02-11 Thread Shyam Sarkar
Hello,

Please help me to understand what I am going to implement for Timestamp. Do we 
need LOCALTIMESTAMP implementation? See the comparisons below::

=


LOCALTIMESTAMP
It's often important to get the value of current date and time. Below are the 
functions used to do that in the different implementations.

Standard The current timestamp (without time zone) is retrieved with the 
LOCALTIMESTAMP function which may be used as: 
SELECT LOCALTIMESTAMP ...
or
SELECT LOCALTIMESTAMP(precision) ...

Note that SELECT LOCALTIMESTAMP() ... is illegal: If you don't care about the 
precision, then you must not use any parenthesis.

If the DBMS supports the non-core time zone features (feature ID F411), then it 
must also provide the functions CURRENT_TIMESTAMP and 
CURRENT_TIMESTAMP(precision) which return a value of type TIMESTAMP WITH TIME 
ZONE. If it doesn't support time zones, then the DBMS must not provide a 
CURRENT_TIMESTAMP function.
 
PostgreSQL Follows the standard. 
Documentation
 
DB2 Doesn't have the LOCALTIMESTAMP function. 
Instead, it provides a special, magic value ('special register' in IBM 
language), CURRENT_TIMESTAMP (alias to 'CURRENT TIMESTAMP') which may be used 
as though it were a function without arguments. However, since DB2 doesn't 
provide TIMESTAMP WITH TIME ZONE support, the availability of CURRENT_TIMESTAMP 
could be said to be against the standard—at least confusing.

Documentation
 
MSSQL Doesn't have the LOCALTIMESTAMP function. 
Instead, it has CURRENT_TIMESTAMP which—however—doesn't return a value of 
TIMESTAMP WITH TIME ZONE, but rather a value of MSSQL's DATETIME type (which 
doesn't contain time zone information).

Documentation
 
MySQL Follows the standard. 
Documentation
 
Oracle Follows the standard. 
Informix On my TODO. 



Thanks,
shyam_sar...@yahoo.com







Datetime type in SQL standard

2009-02-11 Thread Shyam Sarkar
Following is the BNF for datetime type in SQL 2003::

datetime type::= 
 DATE 
 | TIME [ left paren time precision right paren ] [ with or 
without time zone ]
 | TIMESTAMP [ left paren timestamp precision right paren ] [ 
with or without time zone ]

Please let me know if we implement the standard completely or not.

Thanks,
shyam_sar...@yahoo.com



  


RE: Need LOCALTIMESTAMP ?

2009-02-11 Thread Ashish Thusoo
Hi Shyam,

I think HIVE-192 is about the fact that there is no support for the timestamp 
type in Hive (or for that matter date and datetime types). In FB we are using 
strings to hold this information. 

If you are planning to add a built in function like localtimestamp, then that 
should probably go into a different JIRA.

We have tried to adhere to mysql way of doing things as we find more folks 
using it (at least in our company) and looks from your research that they are 
basically standards compliant. So my vote will be to go with mysql semantics 
and CURRENT_TIMESTAMP construct.

Ashish


-Original Message-
From: Shyam Sarkar [mailto:shyam_sar...@yahoo.com] 
Sent: Wednesday, February 11, 2009 2:37 PM
To: hive-dev@hadoop.apache.org
Subject: Need LOCALTIMESTAMP ?

Hello,

Please help me to understand what I am going to implement for Timestamp. Do we 
need LOCALTIMESTAMP implementation? See the comparisons below::

=


LOCALTIMESTAMP
It's often important to get the value of current date and time. Below are the 
functions used to do that in the different implementations.

Standard The current timestamp (without time zone) is retrieved with the 
LOCALTIMESTAMP function which may be used as: 
SELECT LOCALTIMESTAMP ...
or
SELECT LOCALTIMESTAMP(precision) ...

Note that SELECT LOCALTIMESTAMP() ... is illegal: If you don't care about the 
precision, then you must not use any parenthesis.

If the DBMS supports the non-core time zone features (feature ID F411), then it 
must also provide the functions CURRENT_TIMESTAMP and 
CURRENT_TIMESTAMP(precision) which return a value of type TIMESTAMP WITH TIME 
ZONE. If it doesn't support time zones, then the DBMS must not provide a 
CURRENT_TIMESTAMP function.
 
PostgreSQL Follows the standard. 
Documentation
 
DB2 Doesn't have the LOCALTIMESTAMP function. 
Instead, it provides a special, magic value ('special register' in IBM 
language), CURRENT_TIMESTAMP (alias to 'CURRENT TIMESTAMP') which may be used 
as though it were a function without arguments. However, since DB2 doesn't 
provide TIMESTAMP WITH TIME ZONE support, the availability of CURRENT_TIMESTAMP 
could be said to be against the standard-at least confusing.

Documentation
 
MSSQL Doesn't have the LOCALTIMESTAMP function. 
Instead, it has CURRENT_TIMESTAMP which-however-doesn't return a value of 
TIMESTAMP WITH TIME ZONE, but rather a value of MSSQL's DATETIME type (which 
doesn't contain time zone information).

Documentation
 
MySQL Follows the standard. 
Documentation
 
Oracle Follows the standard. 
Informix On my TODO. 



Thanks,
shyam_sar...@yahoo.com




  


timestamp examples in standard SQL

2009-02-11 Thread Shyam Sarkar
Some examples with timestamp in SQL standard ::
==

Create Table

CREATE TABLE Stu_Table (
 Stu_Id varchar(2),
 Stu_Name varchar(10),
 Stu_Dob timestamp NOT NULL );
 

Insert Date Into Stu_Table

Now insert into statement is used to add the records or rows into a table 
'Stu_Table'.

Insert Into Stu_Table Values('1', 'Komal', '1984-10-27');
Insert Into Stu_Table Values('2', 'ajay', '1985-04-19');
Insert Into Stu_Table Values('3', 'Santosh', '1986-11-16');
 

Stu_Table

++--+-+
| Stu_Id | Stu_Name | Stu_Dob |
++--+-+
| 1  | Komal| 1984-10-27 00:00:00 |
| 2  | ajay | 1985-04-19 00:00:00 |
| 3  | Santosh  | 1986-11-16 00:00:00 |
++--+-+
 

Query

The given below query return you the list of records enlisted in the select 
statement. The Where clause restrict the select query and return you the 
records from stu_Dob column between '1984-01-01' And '1986-1-1'. 

Select * From Stu_Table
Where Stu_Dob Between '1984-01-01' And '1986-1-1';
 

Result

++--+-+
| Stu_Id | Stu_Name | Stu_Dob |
++--+-+
| 1  | Komal| 1984-10-27 00:00:00 |
| 2  | ajay | 1985-04-19 00:00:00 |
++--+-+
 



  


RE: Need LOCALTIMESTAMP ?

2009-02-11 Thread Shyam Sarkar
Hi Ashish,

Read about the latest TIMESTAMP implementation in MySQL 5.0 version and suggest 
::

http://dev.mysql.com/doc/refman/5.0/en/timestamp.html

Also please comment on the following MySQL 5.0 implementation semantics::

TIMESTAMP values are converted from the current time zone to UTC for storage, 
and converted back from UTC to the current time zone for retrieval. (This 
occurs only for the TIMESTAMP data type, not for other types such as DATETIME.) 
By default, the current time zone for each connection is the server's time.

Should we do the same thing?

Thanks,
shyam_sar...@yahoo.com



--- On Wed, 2/11/09, Ashish Thusoo athu...@facebook.com wrote:

 From: Ashish Thusoo athu...@facebook.com
 Subject: RE: Need LOCALTIMESTAMP ?
 To: hive-dev@hadoop.apache.org hive-dev@hadoop.apache.org, 
 shyam_sar...@yahoo.com shyam_sar...@yahoo.com
 Date: Wednesday, February 11, 2009, 2:55 PM
 Hi Shyam,
 
 I think HIVE-192 is about the fact that there is no support
 for the timestamp type in Hive (or for that matter date and
 datetime types). In FB we are using strings to hold this
 information. 
 
 If you are planning to add a built in function like
 localtimestamp, then that should probably go into a
 different JIRA.
 
 We have tried to adhere to mysql way of doing things as we
 find more folks using it (at least in our company) and looks
 from your research that they are basically standards
 compliant. So my vote will be to go with mysql semantics and
 CURRENT_TIMESTAMP construct.
 
 Ashish
 
 
 -Original Message-
 From: Shyam Sarkar [mailto:shyam_sar...@yahoo.com] 
 Sent: Wednesday, February 11, 2009 2:37 PM
 To: hive-dev@hadoop.apache.org
 Subject: Need LOCALTIMESTAMP ?
 
 Hello,
 
 Please help me to understand what I am going to implement
 for Timestamp. Do we need LOCALTIMESTAMP implementation? See
 the comparisons below::
 
 =
 
 
 LOCALTIMESTAMP
 It's often important to get the value of current date
 and time. Below are the functions used to do that in the
 different implementations.
 
 Standard The current timestamp (without time zone) is
 retrieved with the LOCALTIMESTAMP function which may be used
 as: 
 SELECT LOCALTIMESTAMP ...
 or
 SELECT LOCALTIMESTAMP(precision) ...
 
 Note that SELECT LOCALTIMESTAMP() ... is
 illegal: If you don't care about the precision, then you
 must not use any parenthesis.
 
 If the DBMS supports the non-core time zone features
 (feature ID F411), then it must also provide the functions
 CURRENT_TIMESTAMP and CURRENT_TIMESTAMP(precision) which
 return a value of type TIMESTAMP WITH TIME ZONE. If it
 doesn't support time zones, then the DBMS must not
 provide a CURRENT_TIMESTAMP function.
  
 PostgreSQL Follows the standard. 
 Documentation
  
 DB2 Doesn't have the LOCALTIMESTAMP function. 
 Instead, it provides a special, magic value ('special
 register' in IBM language), CURRENT_TIMESTAMP (alias to
 'CURRENT TIMESTAMP') which may be used as though it
 were a function without arguments. However, since DB2
 doesn't provide TIMESTAMP WITH TIME ZONE support, the
 availability of CURRENT_TIMESTAMP could be said to be
 against the standard-at least confusing.
 
 Documentation
  
 MSSQL Doesn't have the LOCALTIMESTAMP function. 
 Instead, it has CURRENT_TIMESTAMP which-however-doesn't
 return a value of TIMESTAMP WITH TIME ZONE, but rather a
 value of MSSQL's DATETIME type (which doesn't
 contain time zone information).
 
 Documentation
  
 MySQL Follows the standard. 
 Documentation
  
 Oracle Follows the standard. 
 Informix On my TODO. 
 
 
 
 Thanks,
 shyam_sar...@yahoo.com


  


[jira] Updated: (HIVE-270) Add a lazy-deserialized SerDe for space and cpu efficient serialization of rows with primitive types

2009-02-11 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-270:


Attachment: HIVE-270.4.patch

After looking at the code again, I decided to postpone the change of 
serialization until we move from String to Text.

With that change, we will be able to get rid of all UTF-8 encoding in the 
serialization.

This patch incorporates all other comments from Joydeep.


 Add a lazy-deserialized SerDe for space and cpu efficient serialization of 
 rows with primitive types
 

 Key: HIVE-270
 URL: https://issues.apache.org/jira/browse/HIVE-270
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Reporter: Zheng Shao
Assignee: Zheng Shao
 Attachments: HIVE-270.1.patch, HIVE-270.3.patch, HIVE-270.4.patch


 We want to add a lazy-deserialized SerDe for space and cpu efficient 
 serialization of rows with primitive types.
 This SerDe will share the same format as 
 MetadataTypedColumnsetSerDe/TCTLSeparatedProtocol to be backward compatible.
 This SerDe will be used to replace the default table SerDe, and the SerDe 
 used to communicate with user scripts.
 For simplicity, we don't plan to support nested structure with this SerDe.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-270) Add a lazy-deserialized SerDe for space and cpu efficient serialization of rows with primitive types

2009-02-11 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-270:


Attachment: HIVE-270.5.patch

Removed all test case changes (can be automatically generated by ant test 
-Doverwrite=true)


 Add a lazy-deserialized SerDe for space and cpu efficient serialization of 
 rows with primitive types
 

 Key: HIVE-270
 URL: https://issues.apache.org/jira/browse/HIVE-270
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Reporter: Zheng Shao
Assignee: Zheng Shao
 Attachments: HIVE-270.1.patch, HIVE-270.3.patch, HIVE-270.4.patch, 
 HIVE-270.5.patch


 We want to add a lazy-deserialized SerDe for space and cpu efficient 
 serialization of rows with primitive types.
 This SerDe will share the same format as 
 MetadataTypedColumnsetSerDe/TCTLSeparatedProtocol to be backward compatible.
 This SerDe will be used to replace the default table SerDe, and the SerDe 
 used to communicate with user scripts.
 For simplicity, we don't plan to support nested structure with this SerDe.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-131) insert overwrite directory leaves behind uncommitted/tmp files from failed tasks

2009-02-11 Thread Joydeep Sen Sarma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joydeep Sen Sarma updated HIVE-131:
---

Attachment: hive-131.patch.2

Dhruba said:

 1. I see that execute returns values 1, 2, and 3. It will be good to document 
 what these values mean.
 2. Staring hadoop 0.19, it might make sense to set FileSystem.deleteOnExit() 
 for files that are temporary.
 3. It is interesting to note that now there is an extra step jobClose() that 
 gets triggered on the client-side after the job is complete. Prior to this 
 patch, a job would be successful even if the client-side has disappeared 
 before the job is completed. This patch requires that the client remains 
 active and healthy till the entire job is complete. This probably is ok for 
 Hive, especially because Hive anyway requires job-chaining and I do not see 
 any other way to do it

- incorporated  suggestion to use deleteOnExit where available.
- return codes are always accompanied by a corresponding message on the 
console/log. So don't see much point creating additional documentation around 
them.
- hive has always depended on client side code-patch for query completion.

 insert overwrite directory leaves behind uncommitted/tmp files from failed 
 tasks
 

 Key: HIVE-131
 URL: https://issues.apache.org/jira/browse/HIVE-131
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Joydeep Sen Sarma
Assignee: Joydeep Sen Sarma
Priority: Critical
 Attachments: HIVE-131.patch.1, hive-131.patch.2


 _tmp files are getting left behind on insert overwrite directory:
 /user/jssarma/ctst1/40422_m_000195_0.deflate  r 3 13285 2008-12-07 01:47  
 rw-r--r-- jssarma supergroup
 /user/jssarma/ctst1/40422_m_000196_0.deflate  r 3 3055  2008-12-07 01:46  
 rw-r--r-- jssarma supergroup
 /user/jssarma/ctst1/_tmp.40422_m_33_0 r 3 0 2008-12-07 01:53  rw-r--r-- 
 jssarma supergroup
 /user/jssarma/ctst1/_tmp.40422_m_37_1 r 3 0 2008-12-07 01:53  rw-r--r-- 
 jssarma supergroup
 this happened with speculative execution. the code looks good (in fact in 
 this case many speculative tasks were launched - and only a couple caused 
 problems). Almost seems like these files did not appear in the namespace 
 until after the map-reduce job finished and the movetask did a listing of the 
 output dir ..

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.