[jira] Updated: (HIVE-675) add database/schema support Hive QL

2010-08-26 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-675:


   Status: Resolved  (was: Patch Available)
 Hadoop Flags: [Reviewed]
Fix Version/s: (was: 0.6.0)
   Resolution: Fixed

Committed in trunk - Thanks Carl

> add database/schema support Hive QL
> ---
>
> Key: HIVE-675
> URL: https://issues.apache.org/jira/browse/HIVE-675
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Metastore, Query Processor
>Reporter: Prasad Chakka
>Assignee: Carl Steinbach
> Fix For: 0.7.0
>
> Attachments: hive-675-2009-9-16.patch, hive-675-2009-9-19.patch, 
> hive-675-2009-9-21.patch, hive-675-2009-9-23.patch, hive-675-2009-9-7.patch, 
> hive-675-2009-9-8.patch, HIVE-675-2010-08-16.patch.txt, 
> HIVE-675-2010-7-16.patch.txt, HIVE-675-2010-8-4.patch.txt, 
> HIVE-675.10.patch.txt, HIVE-675.11.patch.txt, HIVE-675.12.patch.txt, 
> HIVE-675.13.patch.txt
>
>
> Currently all Hive tables reside in single namespace (default). Hive should 
> support multiple namespaces (databases or schemas) such that users can create 
> tables in their specific namespaces. These name spaces can have different 
> warehouse directories (with a default naming scheme) and possibly different 
> properties.
> There is already some support for this in metastore but Hive query parser 
> should have this feature as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1600) need to sort hook input/output lists for test result determinism

2010-08-26 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903182#action_12903182
 ] 

Namit Jain commented on HIVE-1600:
--

Do you want to make inputs and outputs in BaseSemanticAnalyzer to 
SortedSet/TreeSet instead ?
This way we can be deterministic across all hooks.

> need to sort hook input/output lists for test result determinism
> 
>
> Key: HIVE-1600
> URL: https://issues.apache.org/jira/browse/HIVE-1600
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Testing Infrastructure
>Affects Versions: 0.6.0
>Reporter: John Sichi
>Assignee: John Sichi
> Fix For: 0.7.0
>
> Attachments: HIVE-1600.1.patch
>
>
> Begin forwarded message:
> From: Ning Zhang 
> Date: August 26, 2010 2:47:26 PM PDT
> To: John Sichi 
> Cc: "hive-dev@hadoop.apache.org" 
> Subject: Re: failure in load_dyn_part1.q
> Yes I saw this error before but if it does not repro. So it's probably an 
> ordering issue in POSTHOOK. 
> On Aug 26, 2010, at 2:39 PM, John Sichi wrote:
> I'm seeing this failure due to a result diff when running tests on latest 
> trunk:
> POSTHOOK: Input: defa...@srcpart@ds=2008-04-08/hr=12
> POSTHOOK: Input: defa...@srcpart@ds=2008-04-09/hr=11
> POSTHOOK: Input: defa...@srcpart@ds=2008-04-09/hr=12
> -POSTHOOK: Output: defa...@nzhang_part2@ds=2008-12-31/hr=11
> -POSTHOOK: Output: defa...@nzhang_part2@ds=2008-12-31/hr=12
> POSTHOOK: Output: defa...@nzhang_part1@ds=2008-04-08/hr=11
> POSTHOOK: Output: defa...@nzhang_part1@ds=2008-04-08/hr=12
> +POSTHOOK: Output: defa...@nzhang_part2@ds=2008-12-31/hr=11
> +POSTHOOK: Output: defa...@nzhang_part2@ds=2008-12-31/hr=12
> Did something change recently?  Or are we missing a Java-level sort on the 
> input/output list for determinism?
> JVS

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1600) need to sort hook input/output lists for test result determinism

2010-08-26 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903183#action_12903183
 ] 

John Sichi commented on HIVE-1600:
--

Blocked on HIVE-1601 for regenerating 0.17 test output.


> need to sort hook input/output lists for test result determinism
> 
>
> Key: HIVE-1600
> URL: https://issues.apache.org/jira/browse/HIVE-1600
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Testing Infrastructure
>Affects Versions: 0.6.0
>Reporter: John Sichi
>Assignee: John Sichi
> Fix For: 0.7.0
>
> Attachments: HIVE-1600.1.patch
>
>
> Begin forwarded message:
> From: Ning Zhang 
> Date: August 26, 2010 2:47:26 PM PDT
> To: John Sichi 
> Cc: "hive-dev@hadoop.apache.org" 
> Subject: Re: failure in load_dyn_part1.q
> Yes I saw this error before but if it does not repro. So it's probably an 
> ordering issue in POSTHOOK. 
> On Aug 26, 2010, at 2:39 PM, John Sichi wrote:
> I'm seeing this failure due to a result diff when running tests on latest 
> trunk:
> POSTHOOK: Input: defa...@srcpart@ds=2008-04-08/hr=12
> POSTHOOK: Input: defa...@srcpart@ds=2008-04-09/hr=11
> POSTHOOK: Input: defa...@srcpart@ds=2008-04-09/hr=12
> -POSTHOOK: Output: defa...@nzhang_part2@ds=2008-12-31/hr=11
> -POSTHOOK: Output: defa...@nzhang_part2@ds=2008-12-31/hr=12
> POSTHOOK: Output: defa...@nzhang_part1@ds=2008-04-08/hr=11
> POSTHOOK: Output: defa...@nzhang_part1@ds=2008-04-08/hr=12
> +POSTHOOK: Output: defa...@nzhang_part2@ds=2008-12-31/hr=11
> +POSTHOOK: Output: defa...@nzhang_part2@ds=2008-12-31/hr=12
> Did something change recently?  Or are we missing a Java-level sort on the 
> input/output list for determinism?
> JVS

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1601) Hadoop 0.17 ant test broken by HIVE-1523

2010-08-26 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903181#action_12903181
 ] 

John Sichi commented on HIVE-1601:
--

This JIRA is in honor of Facebook HQ.


> Hadoop 0.17 ant test broken by HIVE-1523
> 
>
> Key: HIVE-1601
> URL: https://issues.apache.org/jira/browse/HIVE-1601
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Testing Infrastructure
>Affects Versions: 0.7.0
>Reporter: John Sichi
>Assignee: Joydeep Sen Sarma
> Fix For: 0.7.0
>
>
> compile-test:
>[javac] /data/users/jsichi/open/hive-trunk/build-common.xml:304: warning: 
> 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set 
> to false for repeatable builds
>[javac] Compiling 33 source files to 
> /data/users/jsichi/open/hive-trunk/build/ql/test/classes
> BUILD FAILED
> /data/users/jsichi/open/hive-trunk/build.xml:168: The following error 
> occurred while executing this line:
> /data/users/jsichi/open/hive-trunk/build.xml:105: The following error 
> occurred while executing this line:
> /data/users/jsichi/open/hive-trunk/build-common.xml:304: 
> /data/users/jsichi/open/hive-trunk/build/hadoopcore/hadoop-0.17.2.1/lib/jsp-2.1
>  does not exist.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1601) Hadoop 0.17 ant test broken by HIVE-1523

2010-08-26 Thread John Sichi (JIRA)
Hadoop 0.17 ant test broken by HIVE-1523


 Key: HIVE-1601
 URL: https://issues.apache.org/jira/browse/HIVE-1601
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Testing Infrastructure
Affects Versions: 0.7.0
Reporter: John Sichi
Assignee: Joydeep Sen Sarma
 Fix For: 0.7.0



compile-test:
   [javac] /data/users/jsichi/open/hive-trunk/build-common.xml:304: warning: 
'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to 
false for repeatable builds
   [javac] Compiling 33 source files to 
/data/users/jsichi/open/hive-trunk/build/ql/test/classes

BUILD FAILED
/data/users/jsichi/open/hive-trunk/build.xml:168: The following error occurred 
while executing this line:
/data/users/jsichi/open/hive-trunk/build.xml:105: The following error occurred 
while executing this line:
/data/users/jsichi/open/hive-trunk/build-common.xml:304: 
/data/users/jsichi/open/hive-trunk/build/hadoopcore/hadoop-0.17.2.1/lib/jsp-2.1 
does not exist.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-675) add database/schema support Hive QL

2010-08-26 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903180#action_12903180
 ] 

Namit Jain commented on HIVE-675:
-

My bad in applying the patch - running tests again

> add database/schema support Hive QL
> ---
>
> Key: HIVE-675
> URL: https://issues.apache.org/jira/browse/HIVE-675
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Metastore, Query Processor
>Reporter: Prasad Chakka
>Assignee: Carl Steinbach
> Fix For: 0.6.0, 0.7.0
>
> Attachments: hive-675-2009-9-16.patch, hive-675-2009-9-19.patch, 
> hive-675-2009-9-21.patch, hive-675-2009-9-23.patch, hive-675-2009-9-7.patch, 
> hive-675-2009-9-8.patch, HIVE-675-2010-08-16.patch.txt, 
> HIVE-675-2010-7-16.patch.txt, HIVE-675-2010-8-4.patch.txt, 
> HIVE-675.10.patch.txt, HIVE-675.11.patch.txt, HIVE-675.12.patch.txt, 
> HIVE-675.13.patch.txt
>
>
> Currently all Hive tables reside in single namespace (default). Hive should 
> support multiple namespaces (databases or schemas) such that users can create 
> tables in their specific namespaces. These name spaces can have different 
> warehouse directories (with a default naming scheme) and possibly different 
> properties.
> There is already some support for this in metastore but Hive query parser 
> should have this feature as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HIVE-1591) Lock the database also as part of locking a table/partition

2010-08-26 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain reassigned HIVE-1591:


Assignee: Namit Jain

> Lock the database also as part of locking a table/partition
> ---
>
> Key: HIVE-1591
> URL: https://issues.apache.org/jira/browse/HIVE-1591
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Fix For: 0.7.0
>
> Attachments: hive.1591.1.patch
>
>
> Drop database should fail if a table in that database is being queried.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1600) need to sort hook input/output lists for test result determinism

2010-08-26 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-1600:
-

Attachment: HIVE-1600.1.patch

Patch file of code only; still working on regenerating test logs.


> need to sort hook input/output lists for test result determinism
> 
>
> Key: HIVE-1600
> URL: https://issues.apache.org/jira/browse/HIVE-1600
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Testing Infrastructure
>Affects Versions: 0.6.0
>Reporter: John Sichi
>Assignee: John Sichi
> Fix For: 0.7.0
>
> Attachments: HIVE-1600.1.patch
>
>
> Begin forwarded message:
> From: Ning Zhang 
> Date: August 26, 2010 2:47:26 PM PDT
> To: John Sichi 
> Cc: "hive-dev@hadoop.apache.org" 
> Subject: Re: failure in load_dyn_part1.q
> Yes I saw this error before but if it does not repro. So it's probably an 
> ordering issue in POSTHOOK. 
> On Aug 26, 2010, at 2:39 PM, John Sichi wrote:
> I'm seeing this failure due to a result diff when running tests on latest 
> trunk:
> POSTHOOK: Input: defa...@srcpart@ds=2008-04-08/hr=12
> POSTHOOK: Input: defa...@srcpart@ds=2008-04-09/hr=11
> POSTHOOK: Input: defa...@srcpart@ds=2008-04-09/hr=12
> -POSTHOOK: Output: defa...@nzhang_part2@ds=2008-12-31/hr=11
> -POSTHOOK: Output: defa...@nzhang_part2@ds=2008-12-31/hr=12
> POSTHOOK: Output: defa...@nzhang_part1@ds=2008-04-08/hr=11
> POSTHOOK: Output: defa...@nzhang_part1@ds=2008-04-08/hr=12
> +POSTHOOK: Output: defa...@nzhang_part2@ds=2008-12-31/hr=11
> +POSTHOOK: Output: defa...@nzhang_part2@ds=2008-12-31/hr=12
> Did something change recently?  Or are we missing a Java-level sort on the 
> input/output list for determinism?
> JVS

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: failure in load_dyn_part1.q

2010-08-26 Thread Namit Jain
I think I forgot to run tests for HIVE-1523 for hadoop 0.17.
The jsps were added as part of that.




-Original Message-
From: John Sichi [mailto:jsi...@facebook.com] 
Sent: Thursday, August 26, 2010 6:41 PM
To: 
Cc: Ning Zhang; Joydeep Sen Sarma
Subject: Re: failure in load_dyn_part1.q

Now I'm getting this failure while trying to update the logs for 0.17:

compile-test:
[javac] /data/users/jsichi/open/hive-trunk/build-common.xml:304: warning: 
'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to 
false for repeatable builds
[javac] Compiling 33 source files to 
/data/users/jsichi/open/hive-trunk/build/ql/test/classes

BUILD FAILED
/data/users/jsichi/open/hive-trunk/build.xml:168: The following error occurred 
while executing this line:
/data/users/jsichi/open/hive-trunk/build.xml:105: The following error occurred 
while executing this line:
/data/users/jsichi/open/hive-trunk/build-common.xml:304: 
/data/users/jsichi/open/hive-trunk/build/hadoopcore/hadoop-0.17.2.1/lib/jsp-2.1 
does not exist.

Maybe it's environmental, but I noticed that Hudson 0.17 build started failing 
too with exactly the same exception (whereas 0.20 is fine):

https://hudson.apache.org/hudson/job/Hive-trunk-h0.17/changes
https://hudson.apache.org/hudson/job/Hive-trunk-h0.17/lastCompletedBuild/console

Did HIVE-1523 commit tests pass with 0.17?

JVS

On Aug 26, 2010, at 2:48 PM, John Sichi wrote:

> OK, I'll create a patch for sorting them in Pre/PostExecutePrinter.
> 
> JVS
> 
> On Aug 26, 2010, at 2:47 PM, Ning Zhang wrote:
> 
>> Yes I saw this error before but if it does not repro. So it's probably an 
>> ordering issue in POSTHOOK. 
>> 
>> On Aug 26, 2010, at 2:39 PM, John Sichi wrote:
>> 
>>> I'm seeing this failure due to a result diff when running tests on latest 
>>> trunk:
>>> 
>>> POSTHOOK: Input: defa...@srcpart@ds=2008-04-08/hr=12
>>> POSTHOOK: Input: defa...@srcpart@ds=2008-04-09/hr=11
>>> POSTHOOK: Input: defa...@srcpart@ds=2008-04-09/hr=12
>>> -POSTHOOK: Output: defa...@nzhang_part2@ds=2008-12-31/hr=11
>>> -POSTHOOK: Output: defa...@nzhang_part2@ds=2008-12-31/hr=12
>>> POSTHOOK: Output: defa...@nzhang_part1@ds=2008-04-08/hr=11
>>> POSTHOOK: Output: defa...@nzhang_part1@ds=2008-04-08/hr=12
>>> +POSTHOOK: Output: defa...@nzhang_part2@ds=2008-12-31/hr=11
>>> +POSTHOOK: Output: defa...@nzhang_part2@ds=2008-12-31/hr=12
>>> 
>>> Did something change recently?  Or are we missing a Java-level sort on the 
>>> input/output list for determinism?
>>> 
>>> JVS
>>> 
>> 
> 



Re: failure in load_dyn_part1.q

2010-08-26 Thread John Sichi
Now I'm getting this failure while trying to update the logs for 0.17:

compile-test:
[javac] /data/users/jsichi/open/hive-trunk/build-common.xml:304: warning: 
'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to 
false for repeatable builds
[javac] Compiling 33 source files to 
/data/users/jsichi/open/hive-trunk/build/ql/test/classes

BUILD FAILED
/data/users/jsichi/open/hive-trunk/build.xml:168: The following error occurred 
while executing this line:
/data/users/jsichi/open/hive-trunk/build.xml:105: The following error occurred 
while executing this line:
/data/users/jsichi/open/hive-trunk/build-common.xml:304: 
/data/users/jsichi/open/hive-trunk/build/hadoopcore/hadoop-0.17.2.1/lib/jsp-2.1 
does not exist.

Maybe it's environmental, but I noticed that Hudson 0.17 build started failing 
too with exactly the same exception (whereas 0.20 is fine):

https://hudson.apache.org/hudson/job/Hive-trunk-h0.17/changes
https://hudson.apache.org/hudson/job/Hive-trunk-h0.17/lastCompletedBuild/console

Did HIVE-1523 commit tests pass with 0.17?

JVS

On Aug 26, 2010, at 2:48 PM, John Sichi wrote:

> OK, I'll create a patch for sorting them in Pre/PostExecutePrinter.
> 
> JVS
> 
> On Aug 26, 2010, at 2:47 PM, Ning Zhang wrote:
> 
>> Yes I saw this error before but if it does not repro. So it's probably an 
>> ordering issue in POSTHOOK. 
>> 
>> On Aug 26, 2010, at 2:39 PM, John Sichi wrote:
>> 
>>> I'm seeing this failure due to a result diff when running tests on latest 
>>> trunk:
>>> 
>>> POSTHOOK: Input: defa...@srcpart@ds=2008-04-08/hr=12
>>> POSTHOOK: Input: defa...@srcpart@ds=2008-04-09/hr=11
>>> POSTHOOK: Input: defa...@srcpart@ds=2008-04-09/hr=12
>>> -POSTHOOK: Output: defa...@nzhang_part2@ds=2008-12-31/hr=11
>>> -POSTHOOK: Output: defa...@nzhang_part2@ds=2008-12-31/hr=12
>>> POSTHOOK: Output: defa...@nzhang_part1@ds=2008-04-08/hr=11
>>> POSTHOOK: Output: defa...@nzhang_part1@ds=2008-04-08/hr=12
>>> +POSTHOOK: Output: defa...@nzhang_part2@ds=2008-12-31/hr=11
>>> +POSTHOOK: Output: defa...@nzhang_part2@ds=2008-12-31/hr=12
>>> 
>>> Did something change recently?  Or are we missing a Java-level sort on the 
>>> input/output list for determinism?
>>> 
>>> JVS
>>> 
>> 
> 



RE: Deserializing map column via JDBC (HIVE-1378)

2010-08-26 Thread Steven Wong
That sounds like it'll work, at least conceptually. But if the row contains 
primitive and non-primitive columns, the row serialization will be a mix of 
non-JSON and JSON serializations, right? Is that a good thing?


From: John Sichi [mailto:jsi...@facebook.com]
Sent: Thursday, August 26, 2010 12:11 PM
To: Steven Wong
Cc: Zheng Shao; hive-dev@hadoop.apache.org; Jerome Boulon
Subject: Re: Deserializing map column via JDBC (HIVE-1378)

If you replace DynamicSerDe with LazySimpleSerDe on the JDBC client side, can't 
you then tell it to expect JSON serialization for the maps?  That way you can 
leave the FetchTask server side as is.

JVS

On Aug 24, 2010, at 2:50 PM, Steven Wong wrote:


I got sidetracked for awhile

Looking at client.fetchOne, it is a call to the Hive server, which shows the 
following call stack:

SerDeUtils.getJSONString(Object, ObjectInspector) line: 205
LazySimpleSerDe.serialize(Object, ObjectInspector) line: 420
FetchTask.fetch(ArrayList) line: 130
Driver.getResults(ArrayList) line: 660
HiveServer$HiveServerHandler.fetchOne() line: 238

In other words, FetchTask.mSerde (an instance of LazySimpleSerDe) serializes 
the map column into JSON strings. It's because FetchTask.mSerde has been 
initialized by FetchTask.initialize to do it that way.

It appears that the fix is to initialize FetchTask.mSerde differently to do 
ctrl-serialization instead - presumably for the JDBC use case only and not for 
other use cases of FetchTask. Further, it appears that FetchTask.mSerde will do 
ctrl-serialization if it is initialized (via the properties "columns" and 
"columns.types") with the proper schema.

Are these right? Pointers on how to get the proper schema? (From 
FetchTask.work?) And on how to restrict the change to JDBC only? (I have no 
idea.)

For symmetry, LazySimpleSerDe should be used to do ctrl-deserialization on the 
client side, per Zheng's suggestion.

Steven


From: Zheng Shao [mailto:zs...@facebook.com]
Sent: Monday, August 16, 2010 3:57 PM
To: Steven Wong; hive-dev@hadoop.apache.org
Cc: Jerome Boulon
Subject: RE: Deserializing map column via JDBC (HIVE-1378)

I think the call to client.fetchOne should use delimited format, so that 
DynamicSerDe can deserialize it.
This should be a good short-term fix.

Also on a higher level, DynamicSerDe is deprecated.  It will be great to use 
LazySimpleSerDe to handle all serialization/deserializations instead.

Zheng
From: Steven Wong [mailto:sw...@netflix.com]
Sent: Friday, August 13, 2010 7:02 PM
To: Zheng Shao; hive-dev@hadoop.apache.org
Cc: Jerome Boulon
Subject: Deserializing map column via JDBC (HIVE-1378)

Trying to work on HIVE-1378. My first step is to get the Hive JDBC driver to 
return actual values for mapcol in the result set of "select mapcol, bigintcol, 
stringcol from foo", where mapcol is a map column, instead of 
the current behavior of complaining that mapcol's column type is not recognized.

I changed HiveResultSetMetaData.{getColumnType,getColumnTypeName} to recognize 
the map type, but then the returned value for mapcol is always {}, even though 
mapcol does contain some key-value entries. Turns out this is happening in 
HiveQueryResultSet.next:

1.   The call to client.fetchOne returns the string "{"a":"b","x":"y"}   
123 abc".
2.   The serde (DynamicSerDe ds) deserializes the string to the list 
[{},123,"abc"].

The serde cannot correctly deserialize the map because apparently the map is 
not in the serde's expected serialization format. The serde has been 
initialized with TCTLSeparatedProtocol.

Should we make client.fetchOne return a ctrl-separated string? Or should we use 
a different serde/format in HiveQueryResultSet? It seems the first way is 
right; correct me if that's wrong. And how do we do that?

Thanks.
Steven




[jira] Commented: (HIVE-1600) need to sort hook input/output lists for test result determinism

2010-08-26 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903129#action_12903129
 ] 

John Sichi commented on HIVE-1600:
--

There are going to be a bunch of one-time .q.out updates associated with this 
in order to bring everything into line.

> need to sort hook input/output lists for test result determinism
> 
>
> Key: HIVE-1600
> URL: https://issues.apache.org/jira/browse/HIVE-1600
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Testing Infrastructure
>Affects Versions: 0.6.0
>Reporter: John Sichi
>Assignee: John Sichi
> Fix For: 0.7.0
>
>
> Begin forwarded message:
> From: Ning Zhang 
> Date: August 26, 2010 2:47:26 PM PDT
> To: John Sichi 
> Cc: "hive-dev@hadoop.apache.org" 
> Subject: Re: failure in load_dyn_part1.q
> Yes I saw this error before but if it does not repro. So it's probably an 
> ordering issue in POSTHOOK. 
> On Aug 26, 2010, at 2:39 PM, John Sichi wrote:
> I'm seeing this failure due to a result diff when running tests on latest 
> trunk:
> POSTHOOK: Input: defa...@srcpart@ds=2008-04-08/hr=12
> POSTHOOK: Input: defa...@srcpart@ds=2008-04-09/hr=11
> POSTHOOK: Input: defa...@srcpart@ds=2008-04-09/hr=12
> -POSTHOOK: Output: defa...@nzhang_part2@ds=2008-12-31/hr=11
> -POSTHOOK: Output: defa...@nzhang_part2@ds=2008-12-31/hr=12
> POSTHOOK: Output: defa...@nzhang_part1@ds=2008-04-08/hr=11
> POSTHOOK: Output: defa...@nzhang_part1@ds=2008-04-08/hr=12
> +POSTHOOK: Output: defa...@nzhang_part2@ds=2008-12-31/hr=11
> +POSTHOOK: Output: defa...@nzhang_part2@ds=2008-12-31/hr=12
> Did something change recently?  Or are we missing a Java-level sort on the 
> input/output list for determinism?
> JVS

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1600) need to sort hook input/output lists for test result determinism

2010-08-26 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-1600:
-

Summary: need to sort hook input/output lists for test result 
determinism  (was: need to support hook input/output lists for test result 
determinism)
Description: 
Begin forwarded message:

From: Ning Zhang 
Date: August 26, 2010 2:47:26 PM PDT
To: John Sichi 
Cc: "hive-dev@hadoop.apache.org" 
Subject: Re: failure in load_dyn_part1.q

Yes I saw this error before but if it does not repro. So it's probably an 
ordering issue in POSTHOOK. 

On Aug 26, 2010, at 2:39 PM, John Sichi wrote:

I'm seeing this failure due to a result diff when running tests on latest trunk:

POSTHOOK: Input: defa...@srcpart@ds=2008-04-08/hr=12
POSTHOOK: Input: defa...@srcpart@ds=2008-04-09/hr=11
POSTHOOK: Input: defa...@srcpart@ds=2008-04-09/hr=12
-POSTHOOK: Output: defa...@nzhang_part2@ds=2008-12-31/hr=11
-POSTHOOK: Output: defa...@nzhang_part2@ds=2008-12-31/hr=12
POSTHOOK: Output: defa...@nzhang_part1@ds=2008-04-08/hr=11
POSTHOOK: Output: defa...@nzhang_part1@ds=2008-04-08/hr=12
+POSTHOOK: Output: defa...@nzhang_part2@ds=2008-12-31/hr=11
+POSTHOOK: Output: defa...@nzhang_part2@ds=2008-12-31/hr=12

Did something change recently?  Or are we missing a Java-level sort on the 
input/output list for determinism?

JVS



  was:

Begin forwarded message:

From: Ning Zhang 
Date: August 26, 2010 2:47:26 PM PDT
To: John Sichi 
Cc: "hive-dev@hadoop.apache.org" 
Subject: Re: failure in load_dyn_part1.q

Yes I saw this error before but if it does not repro. So it's probably an 
ordering issue in POSTHOOK. 

On Aug 26, 2010, at 2:39 PM, John Sichi wrote:

I'm seeing this failure due to a result diff when running tests on latest trunk:

POSTHOOK: Input: defa...@srcpart@ds=2008-04-08/hr=12
POSTHOOK: Input: defa...@srcpart@ds=2008-04-09/hr=11
POSTHOOK: Input: defa...@srcpart@ds=2008-04-09/hr=12
-POSTHOOK: Output: defa...@nzhang_part2@ds=2008-12-31/hr=11
-POSTHOOK: Output: defa...@nzhang_part2@ds=2008-12-31/hr=12
POSTHOOK: Output: defa...@nzhang_part1@ds=2008-04-08/hr=11
POSTHOOK: Output: defa...@nzhang_part1@ds=2008-04-08/hr=12
+POSTHOOK: Output: defa...@nzhang_part2@ds=2008-12-31/hr=11
+POSTHOOK: Output: defa...@nzhang_part2@ds=2008-12-31/hr=12

Did something change recently?  Or are we missing a Java-level sort on the 
input/output list for determinism?

JVS




> need to sort hook input/output lists for test result determinism
> 
>
> Key: HIVE-1600
> URL: https://issues.apache.org/jira/browse/HIVE-1600
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Testing Infrastructure
>Affects Versions: 0.6.0
>Reporter: John Sichi
>Assignee: John Sichi
> Fix For: 0.7.0
>
>
> Begin forwarded message:
> From: Ning Zhang 
> Date: August 26, 2010 2:47:26 PM PDT
> To: John Sichi 
> Cc: "hive-dev@hadoop.apache.org" 
> Subject: Re: failure in load_dyn_part1.q
> Yes I saw this error before but if it does not repro. So it's probably an 
> ordering issue in POSTHOOK. 
> On Aug 26, 2010, at 2:39 PM, John Sichi wrote:
> I'm seeing this failure due to a result diff when running tests on latest 
> trunk:
> POSTHOOK: Input: defa...@srcpart@ds=2008-04-08/hr=12
> POSTHOOK: Input: defa...@srcpart@ds=2008-04-09/hr=11
> POSTHOOK: Input: defa...@srcpart@ds=2008-04-09/hr=12
> -POSTHOOK: Output: defa...@nzhang_part2@ds=2008-12-31/hr=11
> -POSTHOOK: Output: defa...@nzhang_part2@ds=2008-12-31/hr=12
> POSTHOOK: Output: defa...@nzhang_part1@ds=2008-04-08/hr=11
> POSTHOOK: Output: defa...@nzhang_part1@ds=2008-04-08/hr=12
> +POSTHOOK: Output: defa...@nzhang_part2@ds=2008-12-31/hr=11
> +POSTHOOK: Output: defa...@nzhang_part2@ds=2008-12-31/hr=12
> Did something change recently?  Or are we missing a Java-level sort on the 
> input/output list for determinism?
> JVS

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1600) need to support hook input/output lists for test result determinism

2010-08-26 Thread John Sichi (JIRA)
need to support hook input/output lists for test result determinism
---

 Key: HIVE-1600
 URL: https://issues.apache.org/jira/browse/HIVE-1600
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Testing Infrastructure
Affects Versions: 0.6.0
Reporter: John Sichi
Assignee: John Sichi
 Fix For: 0.7.0



Begin forwarded message:

From: Ning Zhang 
Date: August 26, 2010 2:47:26 PM PDT
To: John Sichi 
Cc: "hive-dev@hadoop.apache.org" 
Subject: Re: failure in load_dyn_part1.q

Yes I saw this error before but if it does not repro. So it's probably an 
ordering issue in POSTHOOK. 

On Aug 26, 2010, at 2:39 PM, John Sichi wrote:

I'm seeing this failure due to a result diff when running tests on latest trunk:

POSTHOOK: Input: defa...@srcpart@ds=2008-04-08/hr=12
POSTHOOK: Input: defa...@srcpart@ds=2008-04-09/hr=11
POSTHOOK: Input: defa...@srcpart@ds=2008-04-09/hr=12
-POSTHOOK: Output: defa...@nzhang_part2@ds=2008-12-31/hr=11
-POSTHOOK: Output: defa...@nzhang_part2@ds=2008-12-31/hr=12
POSTHOOK: Output: defa...@nzhang_part1@ds=2008-04-08/hr=11
POSTHOOK: Output: defa...@nzhang_part1@ds=2008-04-08/hr=12
+POSTHOOK: Output: defa...@nzhang_part2@ds=2008-12-31/hr=11
+POSTHOOK: Output: defa...@nzhang_part2@ds=2008-12-31/hr=12

Did something change recently?  Or are we missing a Java-level sort on the 
input/output list for determinism?

JVS



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: failure in load_dyn_part1.q

2010-08-26 Thread John Sichi
OK, I'll create a patch for sorting them in Pre/PostExecutePrinter.

JVS

On Aug 26, 2010, at 2:47 PM, Ning Zhang wrote:

> Yes I saw this error before but if it does not repro. So it's probably an 
> ordering issue in POSTHOOK. 
> 
> On Aug 26, 2010, at 2:39 PM, John Sichi wrote:
> 
>> I'm seeing this failure due to a result diff when running tests on latest 
>> trunk:
>> 
>> POSTHOOK: Input: defa...@srcpart@ds=2008-04-08/hr=12
>> POSTHOOK: Input: defa...@srcpart@ds=2008-04-09/hr=11
>> POSTHOOK: Input: defa...@srcpart@ds=2008-04-09/hr=12
>> -POSTHOOK: Output: defa...@nzhang_part2@ds=2008-12-31/hr=11
>> -POSTHOOK: Output: defa...@nzhang_part2@ds=2008-12-31/hr=12
>> POSTHOOK: Output: defa...@nzhang_part1@ds=2008-04-08/hr=11
>> POSTHOOK: Output: defa...@nzhang_part1@ds=2008-04-08/hr=12
>> +POSTHOOK: Output: defa...@nzhang_part2@ds=2008-12-31/hr=11
>> +POSTHOOK: Output: defa...@nzhang_part2@ds=2008-12-31/hr=12
>> 
>> Did something change recently?  Or are we missing a Java-level sort on the 
>> input/output list for determinism?
>> 
>> JVS
>> 
> 



Re: failure in load_dyn_part1.q

2010-08-26 Thread Ning Zhang
Yes I saw this error before but if it does not repro. So it's probably an 
ordering issue in POSTHOOK. 

On Aug 26, 2010, at 2:39 PM, John Sichi wrote:

> I'm seeing this failure due to a result diff when running tests on latest 
> trunk:
> 
> POSTHOOK: Input: defa...@srcpart@ds=2008-04-08/hr=12
> POSTHOOK: Input: defa...@srcpart@ds=2008-04-09/hr=11
> POSTHOOK: Input: defa...@srcpart@ds=2008-04-09/hr=12
> -POSTHOOK: Output: defa...@nzhang_part2@ds=2008-12-31/hr=11
> -POSTHOOK: Output: defa...@nzhang_part2@ds=2008-12-31/hr=12
> POSTHOOK: Output: defa...@nzhang_part1@ds=2008-04-08/hr=11
> POSTHOOK: Output: defa...@nzhang_part1@ds=2008-04-08/hr=12
> +POSTHOOK: Output: defa...@nzhang_part2@ds=2008-12-31/hr=11
> +POSTHOOK: Output: defa...@nzhang_part2@ds=2008-12-31/hr=12
> 
> Did something change recently?  Or are we missing a Java-level sort on the 
> input/output list for determinism?
> 
> JVS
> 



failure in load_dyn_part1.q

2010-08-26 Thread John Sichi
I'm seeing this failure due to a result diff when running tests on latest trunk:

 POSTHOOK: Input: defa...@srcpart@ds=2008-04-08/hr=12
 POSTHOOK: Input: defa...@srcpart@ds=2008-04-09/hr=11
 POSTHOOK: Input: defa...@srcpart@ds=2008-04-09/hr=12
-POSTHOOK: Output: defa...@nzhang_part2@ds=2008-12-31/hr=11
-POSTHOOK: Output: defa...@nzhang_part2@ds=2008-12-31/hr=12
 POSTHOOK: Output: defa...@nzhang_part1@ds=2008-04-08/hr=11
 POSTHOOK: Output: defa...@nzhang_part1@ds=2008-04-08/hr=12
+POSTHOOK: Output: defa...@nzhang_part2@ds=2008-12-31/hr=11
+POSTHOOK: Output: defa...@nzhang_part2@ds=2008-12-31/hr=12

Did something change recently?  Or are we missing a Java-level sort on the 
input/output list for determinism?

JVS



[jira] Commented: (HIVE-675) add database/schema support Hive QL

2010-08-26 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903082#action_12903082
 ] 

Carl Steinbach commented on HIVE-675:
-

Cited the wrong lines from the patch. Here are the correct ones:

{noformat}
...
diff --git 
metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStoreRemote.java
 
metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStoreRemote.java
deleted file mode 100644
index bc950b9..000
--- 
metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStoreRemote.java
+++ /dev/null
...
{noformat}

> add database/schema support Hive QL
> ---
>
> Key: HIVE-675
> URL: https://issues.apache.org/jira/browse/HIVE-675
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Metastore, Query Processor
>Reporter: Prasad Chakka
>Assignee: Carl Steinbach
> Fix For: 0.6.0, 0.7.0
>
> Attachments: hive-675-2009-9-16.patch, hive-675-2009-9-19.patch, 
> hive-675-2009-9-21.patch, hive-675-2009-9-23.patch, hive-675-2009-9-7.patch, 
> hive-675-2009-9-8.patch, HIVE-675-2010-08-16.patch.txt, 
> HIVE-675-2010-7-16.patch.txt, HIVE-675-2010-8-4.patch.txt, 
> HIVE-675.10.patch.txt, HIVE-675.11.patch.txt, HIVE-675.12.patch.txt, 
> HIVE-675.13.patch.txt
>
>
> Currently all Hive tables reside in single namespace (default). Hive should 
> support multiple namespaces (databases or schemas) such that users can create 
> tables in their specific namespaces. These name spaces can have different 
> warehouse directories (with a default naming scheme) and possibly different 
> properties.
> There is already some support for this in metastore but Hive query parser 
> should have this feature as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-675) add database/schema support Hive QL

2010-08-26 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-675:


Status: Patch Available  (was: Open)

@Namit: the patch I supplied deletes TestHiveMetaStoreRemote.java and replaces 
it with TestRemoteHiveMetaStore and TestEmbeddedHiveMetaStore. I'm not sure why 
this not working for you. I tried applying the patch again to a fresh checkout 
of the Hive svn repo and confirmed that TestHiveMetaStoreRemote.java was 
deleted.

Here are the relevant lines in the patch:
{noformat}
diff --git eclipse-templates/TestRemoteHiveMetaStore.launchtemplate 
eclipse-templates/TestRemoteHiveMetaStore.launchtemplate
new file mode 100644
index 000..3600e5c
--- /dev/null
{noformat}



> add database/schema support Hive QL
> ---
>
> Key: HIVE-675
> URL: https://issues.apache.org/jira/browse/HIVE-675
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Metastore, Query Processor
>Reporter: Prasad Chakka
>Assignee: Carl Steinbach
> Fix For: 0.6.0, 0.7.0
>
> Attachments: hive-675-2009-9-16.patch, hive-675-2009-9-19.patch, 
> hive-675-2009-9-21.patch, hive-675-2009-9-23.patch, hive-675-2009-9-7.patch, 
> hive-675-2009-9-8.patch, HIVE-675-2010-08-16.patch.txt, 
> HIVE-675-2010-7-16.patch.txt, HIVE-675-2010-8-4.patch.txt, 
> HIVE-675.10.patch.txt, HIVE-675.11.patch.txt, HIVE-675.12.patch.txt, 
> HIVE-675.13.patch.txt
>
>
> Currently all Hive tables reside in single namespace (default). Hive should 
> support multiple namespaces (databases or schemas) such that users can create 
> tables in their specific namespaces. These name spaces can have different 
> warehouse directories (with a default naming scheme) and possibly different 
> properties.
> There is already some support for this in metastore but Hive query parser 
> should have this feature as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1546) Ability to plug custom Semantic Analyzers for Hive Grammar

2010-08-26 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903035#action_12903035
 ] 

John Sichi commented on HIVE-1546:
--

Thanks, I took a look.  But factory producing factory seems like overkill here.

What I was thinking in my previous comment was as follows:

* only one level of factory
* define new interface HiveSemanticAnalyzer (not an abstract class):  copy 
signatures of interface methods from public methods on BaseSemanticAnalyzer; 
add Javadoc (I can help with that if any of them are non-obvious)
* when callers get new analyzer instance from factory, they refer to it via the 
HiveSemanticAnalyzer interface only

If that doesn't work for some reason, then we can just use your original 
pattern where the factory returns a class (BaseSemanticAnalyzer) instead of an 
interface.

Regarding handleGenericFileFormat:  I still don't understand.  Your code in 
Hive is ignoring the return value (and the only implementation doesn't return 
anything, it just throws).   So, either

(a) you're planning to do something inside of Howl with it; in that case, the 
Hive method is just a hook for you to intercept, and it should return void

or

(b) you're planning to add some more code inside of Hive which actually does 
something with the return value (e.g. sets the serde+inputformat+outputformat); 
in that case, you need to keep working on the patch to make this happen


> Ability to plug custom Semantic Analyzers for Hive Grammar
> --
>
> Key: HIVE-1546
> URL: https://issues.apache.org/jira/browse/HIVE-1546
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 0.7.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Fix For: 0.7.0
>
> Attachments: hive-1546.patch, hive-1546_2.patch
>
>
> It will be useful if Semantic Analysis phase is made pluggable such that 
> other projects can do custom analysis of hive queries before doing metastore 
> operations on them. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: [VOTE] Draft Resolution to make Hive a TLP

2010-08-26 Thread Ashish Thusoo
This vote passes unanimously. Will be sending out the draft proposal to 
gene...@hadoop today.

Thanks,
Ashish 

-Original Message-
From: Paul Yang [mailto:py...@facebook.com] 
Sent: Thursday, August 26, 2010 12:36 AM
To: hive-dev@hadoop.apache.org
Subject: RE: [VOTE] Draft Resolution to make Hive a TLP

+1

-Original Message-
From: Zheng Shao [mailto:zsh...@gmail.com]
Sent: Wednesday, August 25, 2010 10:03 PM
To: hive-dev@hadoop.apache.org
Subject: Re: [VOTE] Draft Resolution to make Hive a TLP

+1

On Wed, Aug 25, 2010 at 2:30 PM, yongqiang he  wrote:
> +1
>
> Yongqiang
>
> On Wed, Aug 25, 2010 at 11:21 AM, John Sichi  wrote:
>> +1
>>
>> JVS
>>
>> On Aug 25, 2010, at 8:06 AM, Edward Capriolo wrote:
>>
>>> +1
>>>
>>> On Wed, Aug 25, 2010 at 1:43 AM, Namit Jain  wrote:
 +1

 
 From: Ning Zhang [nzh...@facebook.com]
 Sent: Tuesday, August 24, 2010 9:18 PM
 To: 
 Subject: Re: [VOTE] Draft Resolution to make Hive a TLP

 +1

 On Aug 24, 2010, at 8:50 PM, Carl Steinbach wrote:

> +1
>
> On Tue, Aug 24, 2010 at 6:56 PM, Ashish Thusoo  
> wrote:
>
>> Folks,
>>
>> I am going to make the following proposal at 
>> gene...@hadoop.apache.org
>>
>> In summary this proposal does the following things:
>>
>> 1. Establishes the PMC as comprising of the current committers of 
>> Hive (as of today - 8/24/2010).
>>
>> 2. Proposes Namit Jain to the chair of the project (PMC chairs 
>> have no more power than other PMC members, but they are 
>> responsible for writing regular reports for the Apache board, 
>> assigning rights to new committers, etc.)
>>
>> 3. Tasks the PMC to come up with the bylaws for governance of the 
>> project.
>>
>> Please vote on this as soon as possible(yes I should have done 
>> this as part of the earlier vote, but please bear with me), so 
>> that we can get the ball rolling on this...
>>
>> Thanks,
>> Ashish
>>
>> Draft Resolution to be sent to the Apache Board
>> ---
>>
>> Establish the Apache Hive Project
>>
>>        WHEREAS, the Board of Directors deems it to be in the best
>>        interests of the Foundation and consistent with the
>>        Foundation's purpose to establish a Project Management
>>        Committee charged with the creation and maintenance of
>>        open-source software related to parallel analysis of large
>>        data sets for distribution at no charge to the public.
>>
>>        NOW, THEREFORE, BE IT RESOLVED, that a Project Management
>>        Committee (PMC), to be known as the "Apache Hive Project",
>>        be and hereby is established pursuant to Bylaws of the
>>        Foundation; and be it further
>>
>>        RESOLVED, that the Apache Hive Project be and hereby is
>>        responsible for the creation and maintenance of software
>>        related to parallel analysis of large data sets; and be
>>        it further
>>
>>        RESOLVED, that the office of "Vice President, Apache Hive" 
>> be
>>        and hereby is created, the person holding such office to
>>        serve at the direction of the Board of Directors as the 
>> chair
>>        of the Apache Hive Project, and to have primary 
>> responsibility
>>        for management of the projects within the scope of
>>        responsibility of the Apache Hive Project; and be it 
>> further
>>
>>        RESOLVED, that the persons listed immediately below be and
>>        hereby are appointed to serve as the initial members of 
>> the
>>        Apache Hive Project:
>>            * Namit Jain (na...@apache.org)
>>            * John Sichi (j...@apache.org)
>>            * Zheng Shao (zs...@apache.org)
>>            * Edward Capriolo (appodic...@apache.org)
>>            * Raghotham Murthy (r...@apache.org)
>>            * Ning Zhang (nzh...@apache.org)
>>            * Paul Yang (pa...@apache.org)
>>            * He Yongqiang (he yongqi...@apache.org)
>>            * Prasad Chakka (pras...@apache.org)
>>            * Joydeep Sen Sarma (jsensa...@apache.org)
>>            * Ashish Thusoo (athu...@apache.org)
>>
>>        NOW, THEREFORE, BE IT FURTHER RESOLVED, that Namit Jain
>>        be appointed to the office of Vice President, Apache Hive, 
>> to
>>        serve in accordance with and subject to the direction of 
>> the
>>        Board of Directors and the Bylaws of the Foundation until
>>        death, resignation, retirement, removal or 
>> disqualification,
>>        or until a successor is appointed; and be it further
>>
>>        RESOLVED, that the initial Apache Hive PMC be and hereby 
>> is
>>

Re: Deserializing map column via JDBC (HIVE-1378)

2010-08-26 Thread John Sichi
If you replace DynamicSerDe with LazySimpleSerDe on the JDBC client side, can't 
you then tell it to expect JSON serialization for the maps?  That way you can 
leave the FetchTask server side as is.

JVS

On Aug 24, 2010, at 2:50 PM, Steven Wong wrote:

I got sidetracked for awhile

Looking at client.fetchOne, it is a call to the Hive server, which shows the 
following call stack:

SerDeUtils.getJSONString(Object, ObjectInspector) line: 205
LazySimpleSerDe.serialize(Object, ObjectInspector) line: 420
FetchTask.fetch(ArrayList) line: 130
Driver.getResults(ArrayList) line: 660
HiveServer$HiveServerHandler.fetchOne() line: 238

In other words, FetchTask.mSerde (an instance of LazySimpleSerDe) serializes 
the map column into JSON strings. It’s because FetchTask.mSerde has been 
initialized by FetchTask.initialize to do it that way.

It appears that the fix is to initialize FetchTask.mSerde differently to do 
ctrl-serialization instead – presumably for the JDBC use case only and not for 
other use cases of FetchTask. Further, it appears that FetchTask.mSerde will do 
ctrl-serialization if it is initialized (via the properties “columns” and 
“columns.types”) with the proper schema.

Are these right? Pointers on how to get the proper schema? (From 
FetchTask.work?) And on how to restrict the change to JDBC only? (I have no 
idea.)

For symmetry, LazySimpleSerDe should be used to do ctrl-deserialization on the 
client side, per Zheng’s suggestion.

Steven


From: Zheng Shao [mailto:zs...@facebook.com]
Sent: Monday, August 16, 2010 3:57 PM
To: Steven Wong; hive-dev@hadoop.apache.org
Cc: Jerome Boulon
Subject: RE: Deserializing map column via JDBC (HIVE-1378)

I think the call to client.fetchOne should use delimited format, so that 
DynamicSerDe can deserialize it.
This should be a good short-term fix.

Also on a higher level, DynamicSerDe is deprecated.  It will be great to use 
LazySimpleSerDe to handle all serialization/deserializations instead.

Zheng
From: Steven Wong [mailto:sw...@netflix.com]
Sent: Friday, August 13, 2010 7:02 PM
To: Zheng Shao; hive-dev@hadoop.apache.org
Cc: Jerome Boulon
Subject: Deserializing map column via JDBC (HIVE-1378)

Trying to work on HIVE-1378. My first step is to get the Hive JDBC driver to 
return actual values for mapcol in the result set of “select mapcol, bigintcol, 
stringcol from foo”, where mapcol is a map column, instead of 
the current behavior of complaining that mapcol’s column type is not recognized.

I changed HiveResultSetMetaData.{getColumnType,getColumnTypeName} to recognize 
the map type, but then the returned value for mapcol is always {}, even though 
mapcol does contain some key-value entries. Turns out this is happening in 
HiveQueryResultSet.next:

1.   The call to client.fetchOne returns the string “{"a":"b","x":"y"}   
123 abc”.
2.   The serde (DynamicSerDe ds) deserializes the string to the list 
[{},123,"abc"].

The serde cannot correctly deserialize the map because apparently the map is 
not in the serde’s expected serialization format. The serde has been 
initialized with TCTLSeparatedProtocol.

Should we make client.fetchOne return a ctrl-separated string? Or should we use 
a different serde/format in HiveQueryResultSet? It seems the first way is 
right; correct me if that’s wrong. And how do we do that?

Thanks.
Steven




[jira] Commented: (HIVE-1487) parallelize test query runs

2010-08-26 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902977#action_12902977
 ] 

Joydeep Sen Sarma commented on HIVE-1487:
-

yeah - that would be my gut feel too (just ditch junit)

however - we are going to lose the junit style test outputs etc. long time back 
Ashish did all the velocity stuff to have junit tests. i don't remember the 
exact thinking at that time - but a majority of people wanted to use junit.

threading would actually be good though .. (we have a separate multithreaded 
test right now that we could happily obsolete)

> parallelize test query runs
> ---
>
> Key: HIVE-1487
> URL: https://issues.apache.org/jira/browse/HIVE-1487
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Testing Infrastructure
>Reporter: Joydeep Sen Sarma
>
> HIVE-1464 speeded up serial runs somewhat - but looks like it's still too 
> slow. we should use parallel junit or some similar setup to run test queries 
> in parallel. this should be really easy as well need to just use a separate 
> warehouse/metadb and potentiall mapred system dir location.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HIVE-1594) Typo of hive.merge.size.smallfiles.avgsize prevents change of value

2010-08-26 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain resolved HIVE-1594.
--

 Hadoop Flags: [Reviewed]
Fix Version/s: 0.6.0
   0.7.0
   Resolution: Fixed

Committed in 0.6 and trunk. Thanks Yun

> Typo of hive.merge.size.smallfiles.avgsize prevents change of value
> ---
>
> Key: HIVE-1594
> URL: https://issues.apache.org/jira/browse/HIVE-1594
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Configuration
>Affects Versions: 0.5.0, 0.6.0, 0.7.0
>Reporter: Yun Huang Yong
>Assignee: Yun Huang Yong
>Priority: Minor
> Fix For: 0.6.0, 0.7.0
>
> Attachments: HIVE-1594-0.5.patch, HIVE-1594.patch
>
>
> The setting is described as hive.merge.size.smallfiles.avgsize, 
> however common/src/java/org/apache/hadoop/hive/conf/HiveConf.java reads it as 
> "hive.merge.smallfiles.avgsize" (note the missing '.size.') so the user's 
> setting has no effect and the value is stuck at the default of 16MB.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-675) add database/schema support Hive QL

2010-08-26 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-675:


Status: Open  (was: Patch Available)

> add database/schema support Hive QL
> ---
>
> Key: HIVE-675
> URL: https://issues.apache.org/jira/browse/HIVE-675
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Metastore, Query Processor
>Reporter: Prasad Chakka
>Assignee: Carl Steinbach
> Fix For: 0.6.0, 0.7.0
>
> Attachments: hive-675-2009-9-16.patch, hive-675-2009-9-19.patch, 
> hive-675-2009-9-21.patch, hive-675-2009-9-23.patch, hive-675-2009-9-7.patch, 
> hive-675-2009-9-8.patch, HIVE-675-2010-08-16.patch.txt, 
> HIVE-675-2010-7-16.patch.txt, HIVE-675-2010-8-4.patch.txt, 
> HIVE-675.10.patch.txt, HIVE-675.11.patch.txt, HIVE-675.12.patch.txt, 
> HIVE-675.13.patch.txt
>
>
> Currently all Hive tables reside in single namespace (default). Hive should 
> support multiple namespaces (databases or schemas) such that users can create 
> tables in their specific namespaces. These name spaces can have different 
> warehouse directories (with a default naming scheme) and possibly different 
> properties.
> There is already some support for this in metastore but Hive query parser 
> should have this feature as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-675) add database/schema support Hive QL

2010-08-26 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902968#action_12902968
 ] 

Namit Jain commented on HIVE-675:
-

compile-test:
[javac] Compiling 4 source files to 
/data/users/njain/hive_commit2/hive_commit2/build/metastore/test/classes
[javac] 
/data/users/njain/hive_commit2/hive_commit2/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStoreRemote.java:76:
 
partitionTester(org.apache.hadoop.hive.metastore.HiveMetaStoreClient,org.apache.hadoop.hive.conf.HiveConf)
 in org.apache.hadoop.hive.metastore.TestHiveMetaStore cannot be applied to 
(org.apache.hadoop.hive.metastore.HiveMetaStoreClient,org.apache.hadoop.hive.conf.HiveConf,boolean)
[javac] TestHiveMetaStore.partitionTester(client, hiveConf, true);
[javac]  ^
[javac] 1 error


> add database/schema support Hive QL
> ---
>
> Key: HIVE-675
> URL: https://issues.apache.org/jira/browse/HIVE-675
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Metastore, Query Processor
>Reporter: Prasad Chakka
>Assignee: Carl Steinbach
> Fix For: 0.6.0, 0.7.0
>
> Attachments: hive-675-2009-9-16.patch, hive-675-2009-9-19.patch, 
> hive-675-2009-9-21.patch, hive-675-2009-9-23.patch, hive-675-2009-9-7.patch, 
> hive-675-2009-9-8.patch, HIVE-675-2010-08-16.patch.txt, 
> HIVE-675-2010-7-16.patch.txt, HIVE-675-2010-8-4.patch.txt, 
> HIVE-675.10.patch.txt, HIVE-675.11.patch.txt, HIVE-675.12.patch.txt, 
> HIVE-675.13.patch.txt
>
>
> Currently all Hive tables reside in single namespace (default). Hive should 
> support multiple namespaces (databases or schemas) such that users can create 
> tables in their specific namespaces. These name spaces can have different 
> warehouse directories (with a default naming scheme) and possibly different 
> properties.
> There is already some support for this in metastore but Hive query parser 
> should have this feature as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1487) parallelize test query runs

2010-08-26 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902967#action_12902967
 ] 

John Sichi commented on HIVE-1487:
--

Zheng has a private ptest shell script which takes a different approach:  it 
enumerates all of the .q files, partitions them modulo the degree of 
parallelism, and then executes them in separate environments as separate 
processes.  One advantage is that there's no cross-talk from threading 
(although in the long run testing with multithreading for independent tests 
would be good for coverage).

I'll send you a pointer.


> parallelize test query runs
> ---
>
> Key: HIVE-1487
> URL: https://issues.apache.org/jira/browse/HIVE-1487
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Testing Infrastructure
>Reporter: Joydeep Sen Sarma
>
> HIVE-1464 speeded up serial runs somewhat - but looks like it's still too 
> slow. we should use parallel junit or some similar setup to run test queries 
> in parallel. this should be really easy as well need to just use a separate 
> warehouse/metadb and potentiall mapred system dir location.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1536) Add support for JDBC PreparedStatements

2010-08-26 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902962#action_12902962
 ] 

John Sichi commented on HIVE-1536:
--

+1.  Will commit when tests pass.


> Add support for JDBC PreparedStatements
> ---
>
> Key: HIVE-1536
> URL: https://issues.apache.org/jira/browse/HIVE-1536
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Drivers
>Affects Versions: 0.6.0
>Reporter: Sean Flatley
>Assignee: Sean Flatley
> Fix For: 0.7.0
>
> Attachments: all-tests-ant.log, HIVE-1536-2.patch, HIVE-1536-3.patch, 
> HIVE-1536-changes-2.txt, HIVE-1536-changes-3.txt, HIVE-1536-changes.txt, 
> HIVE-1536.patch, JdbcDriverTest-ant-2.log, JdbcDriverTest-ant.log, 
> TestJdbcDriver-ant-3.log
>
>
> As a result of a Sprint which had us using Pentaho Data Integration with the 
> Hive database we have updated the driver.  Many PreparedStatement methods 
> have been implemented.  A patch will be attached tomorrow with a summary of 
> changes.
> Note:  A checkout of Hive/trunk was performed and the TestJdbcDriver test 
> cased was run.  This was done before any modifications were made to the 
> checked out project.  The testResultSetMetaData failed:
> java.sql.SQLException: Query returned non-zero code: 9, cause: FAILED: 
> Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask
>   at 
> org.apache.hadoop.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:189)
>   at 
> org.apache.hadoop.hive.jdbc.TestJdbcDriver.testResultSetMetaData(TestJdbcDriver.java:530)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at junit.framework.TestCase.runTest(TestCase.java:154)
>   at junit.framework.TestCase.runBare(TestCase.java:127)
>   at junit.framework.TestResult$1.protect(TestResult.java:106)
>   at junit.framework.TestResult.runProtected(TestResult.java:124)
>   at junit.framework.TestResult.run(TestResult.java:109)
>   at junit.framework.TestCase.run(TestCase.java:118)
>   at junit.framework.TestSuite.runTest(TestSuite.java:208)
>   at junit.framework.TestSuite.run(TestSuite.java:203)
>   at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:420)
>   at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:911)
>   at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:768)
> A co-worker did the same and the tests passed.  Both environments were Ubuntu 
> and Hadoop version 0.20.2.
> Tests added to the TestJdbcDriver by us were successful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1487) parallelize test query runs

2010-08-26 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902958#action_12902958
 ] 

Joydeep Sen Sarma commented on HIVE-1487:
-

can people with experience running java tests in parallel comment on this? So 
far these seem the choices:

* upgrade to junit4 and use custom runner that runs in parallel. the downside 
here is that junit does not seem to come with this parallel runner (but there's 
additional code on the web from the junit authors that does the same)

* use parallel-junit. this seems the least disruptive - but this seems like an 
old/dead project

* use TestNG - this is a replacement for junit that has inbuilt parallel 
execution support. but we would not be using junit anymore at all.

any other thoughts on better test setup welcome as well.

> parallelize test query runs
> ---
>
> Key: HIVE-1487
> URL: https://issues.apache.org/jira/browse/HIVE-1487
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Testing Infrastructure
>Reporter: Joydeep Sen Sarma
>
> HIVE-1464 speeded up serial runs somewhat - but looks like it's still too 
> slow. we should use parallel junit or some similar setup to run test queries 
> in parallel. this should be really easy as well need to just use a separate 
> warehouse/metadb and potentiall mapred system dir location.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HIVE-1594) Typo of hive.merge.size.smallfiles.avgsize prevents change of value

2010-08-26 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain reassigned HIVE-1594:


Assignee: Yun Huang Yong

> Typo of hive.merge.size.smallfiles.avgsize prevents change of value
> ---
>
> Key: HIVE-1594
> URL: https://issues.apache.org/jira/browse/HIVE-1594
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Configuration
>Affects Versions: 0.5.0, 0.6.0, 0.7.0
>Reporter: Yun Huang Yong
>Assignee: Yun Huang Yong
>Priority: Minor
> Attachments: HIVE-1594-0.5.patch, HIVE-1594.patch
>
>
> The setting is described as hive.merge.size.smallfiles.avgsize, 
> however common/src/java/org/apache/hadoop/hive/conf/HiveConf.java reads it as 
> "hive.merge.smallfiles.avgsize" (note the missing '.size.') so the user's 
> setting has no effect and the value is stuck at the default of 16MB.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1594) Typo of hive.merge.size.smallfiles.avgsize prevents change of value

2010-08-26 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902933#action_12902933
 ] 

Namit Jain commented on HIVE-1594:
--

Yun, we are not actively patching 0.5 right now.
I will review the changes for trunk and 0.6

> Typo of hive.merge.size.smallfiles.avgsize prevents change of value
> ---
>
> Key: HIVE-1594
> URL: https://issues.apache.org/jira/browse/HIVE-1594
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Configuration
>Affects Versions: 0.5.0, 0.6.0, 0.7.0
>Reporter: Yun Huang Yong
>Assignee: Yun Huang Yong
>Priority: Minor
> Attachments: HIVE-1594-0.5.patch, HIVE-1594.patch
>
>
> The setting is described as hive.merge.size.smallfiles.avgsize, 
> however common/src/java/org/apache/hadoop/hive/conf/HiveConf.java reads it as 
> "hive.merge.smallfiles.avgsize" (note the missing '.size.') so the user's 
> setting has no effect and the value is stuck at the default of 16MB.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1536) Add support for JDBC PreparedStatements

2010-08-26 Thread Sean Flatley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Flatley updated HIVE-1536:
---

Status: Patch Available  (was: Open)

Please see the change log for details.

Change log: HIVE-1536-changes-3.txt
Junit log file : TestJdbcDriver-ant-3.log
Patch: HIVE-1536-3.patch

> Add support for JDBC PreparedStatements
> ---
>
> Key: HIVE-1536
> URL: https://issues.apache.org/jira/browse/HIVE-1536
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Drivers
>Affects Versions: 0.6.0
>Reporter: Sean Flatley
>Assignee: Sean Flatley
> Fix For: 0.7.0
>
> Attachments: all-tests-ant.log, HIVE-1536-2.patch, HIVE-1536-3.patch, 
> HIVE-1536-changes-2.txt, HIVE-1536-changes-3.txt, HIVE-1536-changes.txt, 
> HIVE-1536.patch, JdbcDriverTest-ant-2.log, JdbcDriverTest-ant.log, 
> TestJdbcDriver-ant-3.log
>
>
> As a result of a Sprint which had us using Pentaho Data Integration with the 
> Hive database we have updated the driver.  Many PreparedStatement methods 
> have been implemented.  A patch will be attached tomorrow with a summary of 
> changes.
> Note:  A checkout of Hive/trunk was performed and the TestJdbcDriver test 
> cased was run.  This was done before any modifications were made to the 
> checked out project.  The testResultSetMetaData failed:
> java.sql.SQLException: Query returned non-zero code: 9, cause: FAILED: 
> Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask
>   at 
> org.apache.hadoop.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:189)
>   at 
> org.apache.hadoop.hive.jdbc.TestJdbcDriver.testResultSetMetaData(TestJdbcDriver.java:530)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at junit.framework.TestCase.runTest(TestCase.java:154)
>   at junit.framework.TestCase.runBare(TestCase.java:127)
>   at junit.framework.TestResult$1.protect(TestResult.java:106)
>   at junit.framework.TestResult.runProtected(TestResult.java:124)
>   at junit.framework.TestResult.run(TestResult.java:109)
>   at junit.framework.TestCase.run(TestCase.java:118)
>   at junit.framework.TestSuite.runTest(TestSuite.java:208)
>   at junit.framework.TestSuite.run(TestSuite.java:203)
>   at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:420)
>   at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:911)
>   at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:768)
> A co-worker did the same and the tests passed.  Both environments were Ubuntu 
> and Hadoop version 0.20.2.
> Tests added to the TestJdbcDriver by us were successful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-471) A UDF for simple reflection

2010-08-26 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902928#action_12902928
 ] 

Namit Jain commented on HIVE-471:
-

@Edward, negative tests are good. We are still adding a lot of them

> A UDF for simple reflection
> ---
>
> Key: HIVE-471
> URL: https://issues.apache.org/jira/browse/HIVE-471
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>Priority: Minor
> Fix For: 0.7.0
>
> Attachments: hive-471-gen.diff, HIVE-471.1.patch, HIVE-471.2.patch, 
> HIVE-471.3.patch, HIVE-471.4.patch, HIVE-471.5.patch, hive-471.diff
>
>
> There are many methods in java that are static and have no arguments or can 
> be invoked with one simple parameter. More complicated functions will require 
> a UDF but one generic one can work as a poor-mans UDF.
> {noformat}
> SELECT reflect("java.lang.String", "valueOf", 1), reflect("java.lang.String", 
> "isEmpty")
> FROM src LIMIT 1;
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1536) Add support for JDBC PreparedStatements

2010-08-26 Thread Sean Flatley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Flatley updated HIVE-1536:
---

Attachment: HIVE-1536-3.patch

Revision 3 patch.

> Add support for JDBC PreparedStatements
> ---
>
> Key: HIVE-1536
> URL: https://issues.apache.org/jira/browse/HIVE-1536
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Drivers
>Affects Versions: 0.6.0
>Reporter: Sean Flatley
>Assignee: Sean Flatley
> Fix For: 0.7.0
>
> Attachments: all-tests-ant.log, HIVE-1536-2.patch, HIVE-1536-3.patch, 
> HIVE-1536-changes-2.txt, HIVE-1536-changes-3.txt, HIVE-1536-changes.txt, 
> HIVE-1536.patch, JdbcDriverTest-ant-2.log, JdbcDriverTest-ant.log, 
> TestJdbcDriver-ant-3.log
>
>
> As a result of a Sprint which had us using Pentaho Data Integration with the 
> Hive database we have updated the driver.  Many PreparedStatement methods 
> have been implemented.  A patch will be attached tomorrow with a summary of 
> changes.
> Note:  A checkout of Hive/trunk was performed and the TestJdbcDriver test 
> cased was run.  This was done before any modifications were made to the 
> checked out project.  The testResultSetMetaData failed:
> java.sql.SQLException: Query returned non-zero code: 9, cause: FAILED: 
> Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask
>   at 
> org.apache.hadoop.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:189)
>   at 
> org.apache.hadoop.hive.jdbc.TestJdbcDriver.testResultSetMetaData(TestJdbcDriver.java:530)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at junit.framework.TestCase.runTest(TestCase.java:154)
>   at junit.framework.TestCase.runBare(TestCase.java:127)
>   at junit.framework.TestResult$1.protect(TestResult.java:106)
>   at junit.framework.TestResult.runProtected(TestResult.java:124)
>   at junit.framework.TestResult.run(TestResult.java:109)
>   at junit.framework.TestCase.run(TestCase.java:118)
>   at junit.framework.TestSuite.runTest(TestSuite.java:208)
>   at junit.framework.TestSuite.run(TestSuite.java:203)
>   at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:420)
>   at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:911)
>   at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:768)
> A co-worker did the same and the tests passed.  Both environments were Ubuntu 
> and Hadoop version 0.20.2.
> Tests added to the TestJdbcDriver by us were successful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1536) Add support for JDBC PreparedStatements

2010-08-26 Thread Sean Flatley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Flatley updated HIVE-1536:
---

Attachment: TestJdbcDriver-ant-3.log

Unit test results for revision 3.

> Add support for JDBC PreparedStatements
> ---
>
> Key: HIVE-1536
> URL: https://issues.apache.org/jira/browse/HIVE-1536
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Drivers
>Affects Versions: 0.6.0
>Reporter: Sean Flatley
>Assignee: Sean Flatley
> Fix For: 0.7.0
>
> Attachments: all-tests-ant.log, HIVE-1536-2.patch, HIVE-1536-3.patch, 
> HIVE-1536-changes-2.txt, HIVE-1536-changes-3.txt, HIVE-1536-changes.txt, 
> HIVE-1536.patch, JdbcDriverTest-ant-2.log, JdbcDriverTest-ant.log, 
> TestJdbcDriver-ant-3.log
>
>
> As a result of a Sprint which had us using Pentaho Data Integration with the 
> Hive database we have updated the driver.  Many PreparedStatement methods 
> have been implemented.  A patch will be attached tomorrow with a summary of 
> changes.
> Note:  A checkout of Hive/trunk was performed and the TestJdbcDriver test 
> cased was run.  This was done before any modifications were made to the 
> checked out project.  The testResultSetMetaData failed:
> java.sql.SQLException: Query returned non-zero code: 9, cause: FAILED: 
> Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask
>   at 
> org.apache.hadoop.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:189)
>   at 
> org.apache.hadoop.hive.jdbc.TestJdbcDriver.testResultSetMetaData(TestJdbcDriver.java:530)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at junit.framework.TestCase.runTest(TestCase.java:154)
>   at junit.framework.TestCase.runBare(TestCase.java:127)
>   at junit.framework.TestResult$1.protect(TestResult.java:106)
>   at junit.framework.TestResult.runProtected(TestResult.java:124)
>   at junit.framework.TestResult.run(TestResult.java:109)
>   at junit.framework.TestCase.run(TestCase.java:118)
>   at junit.framework.TestSuite.runTest(TestSuite.java:208)
>   at junit.framework.TestSuite.run(TestSuite.java:203)
>   at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:420)
>   at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:911)
>   at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:768)
> A co-worker did the same and the tests passed.  Both environments were Ubuntu 
> and Hadoop version 0.20.2.
> Tests added to the TestJdbcDriver by us were successful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1536) Add support for JDBC PreparedStatements

2010-08-26 Thread Sean Flatley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Flatley updated HIVE-1536:
---

Attachment: HIVE-1536-changes-3.txt

Revision 3 change log.

> Add support for JDBC PreparedStatements
> ---
>
> Key: HIVE-1536
> URL: https://issues.apache.org/jira/browse/HIVE-1536
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Drivers
>Affects Versions: 0.6.0
>Reporter: Sean Flatley
>Assignee: Sean Flatley
> Fix For: 0.7.0
>
> Attachments: all-tests-ant.log, HIVE-1536-2.patch, 
> HIVE-1536-changes-2.txt, HIVE-1536-changes-3.txt, HIVE-1536-changes.txt, 
> HIVE-1536.patch, JdbcDriverTest-ant-2.log, JdbcDriverTest-ant.log
>
>
> As a result of a Sprint which had us using Pentaho Data Integration with the 
> Hive database we have updated the driver.  Many PreparedStatement methods 
> have been implemented.  A patch will be attached tomorrow with a summary of 
> changes.
> Note:  A checkout of Hive/trunk was performed and the TestJdbcDriver test 
> cased was run.  This was done before any modifications were made to the 
> checked out project.  The testResultSetMetaData failed:
> java.sql.SQLException: Query returned non-zero code: 9, cause: FAILED: 
> Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask
>   at 
> org.apache.hadoop.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:189)
>   at 
> org.apache.hadoop.hive.jdbc.TestJdbcDriver.testResultSetMetaData(TestJdbcDriver.java:530)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at junit.framework.TestCase.runTest(TestCase.java:154)
>   at junit.framework.TestCase.runBare(TestCase.java:127)
>   at junit.framework.TestResult$1.protect(TestResult.java:106)
>   at junit.framework.TestResult.runProtected(TestResult.java:124)
>   at junit.framework.TestResult.run(TestResult.java:109)
>   at junit.framework.TestCase.run(TestCase.java:118)
>   at junit.framework.TestSuite.runTest(TestSuite.java:208)
>   at junit.framework.TestSuite.run(TestSuite.java:203)
>   at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:420)
>   at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:911)
>   at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:768)
> A co-worker did the same and the tests passed.  Both environments were Ubuntu 
> and Hadoop version 0.20.2.
> Tests added to the TestJdbcDriver by us were successful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1019) java.io.FileNotFoundException: HIVE_PLAN (No such file or directory)

2010-08-26 Thread Dave Brondsema (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902888#action_12902888
 ] 

Dave Brondsema commented on HIVE-1019:
--

I'm getting this error with the simplest of queries, on 0.5.0 release and a 
recent build of the 0.6.0 SVN branch.

I have an empty table 'test'.
{noformat}
hive> select * from test;
OK
Time taken: 2.916 seconds
hive> select count(1) from test;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=
In order to set a constant number of reducers:
  set mapred.reduce.tasks=
Starting Job = job_201002192051_14077, Tracking URL = 
http://hadoop-namenode-1.v39.ch3.sourceforge.com:50030/jobdetails.jsp?jobid=job_201002192051_14077
Kill Command = /usr/lib/hadoop/bin/hadoop job  
-Dmapred.job.tracker=hadoop-namenode-1.v39.ch3.sourceforge.com:8021 -kill 
job_201002192051_14077
2010-08-26 14:53:11,348 Stage-1 map = 0%,  reduce = 0%
2010-08-26 14:53:21,413 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_201002192051_14077 with errors

Failed tasks with most(2) failures : 
Task URL: 
http://hadoop-namenode-1.v39.ch3.sourceforge.com:50030/taskdetails.jsp?jobid=job_201002192051_14077&tipid=task_201002192051_14077_m_00

FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.ExecDriver
{noformat}

If I check the job tracker URL it shows:
{noformat}
2010-08-26 14:41:05,789 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
Initializing JVM Metrics with processName=MAP, sessionId=   

 
2010-08-26 14:41:05,865 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 
1   

 
2010-08-26 14:41:05,881 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb = 100 



2010-08-26 14:41:05,929 INFO org.apache.hadoop.mapred.MapTask: data buffer = 
79691776/99614720   

   
2010-08-26 14:41:05,930 INFO org.apache.hadoop.mapred.MapTask: record buffer = 
262144/327680   

 
2010-08-26 14:41:05,988 WARN org.apache.hadoop.mapred.TaskTracker: Error 
running child   

   
java.lang.RuntimeException: java.io.FileNotFoundException: HIVE_PLAN (No such 
file or directory)  

  
at 
org.apache.hadoop.hive.ql.exec.Utilities.getMapRedWork(Utilities.java:110)  


 
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:244) 


 
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:208)


   
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)   



at 
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2210)  


 
Caused by: java.io.FileNotFoundException: HIVE_PLAN (No such file or directory) 

[jira] Commented: (HIVE-471) A UDF for simple reflection

2010-08-26 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902883#action_12902883
 ] 

Edward Capriolo commented on HIVE-471:
--

@Namit,
I thought we were getting rid of negative tests. Am I misinformed?


> A UDF for simple reflection
> ---
>
> Key: HIVE-471
> URL: https://issues.apache.org/jira/browse/HIVE-471
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>Priority: Minor
> Fix For: 0.7.0
>
> Attachments: hive-471-gen.diff, HIVE-471.1.patch, HIVE-471.2.patch, 
> HIVE-471.3.patch, HIVE-471.4.patch, HIVE-471.5.patch, hive-471.diff
>
>
> There are many methods in java that are static and have no arguments or can 
> be invoked with one simple parameter. More complicated functions will require 
> a UDF but one generic one can work as a poor-mans UDF.
> {noformat}
> SELECT reflect("java.lang.String", "valueOf", 1), reflect("java.lang.String", 
> "isEmpty")
> FROM src LIMIT 1;
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1578) Add conf. property hive.exec.show.job.failure.debug.info to enable/disable displaying link to the task with most failures

2010-08-26 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902774#action_12902774
 ] 

Carl Steinbach commented on HIVE-1578:
--

@Paul: I understand that showing % complete is not possible, but I do think it 
would help to log a message as each task completion event is retrieved so that 
the user sees progress is being made. The most frustrating thing about the 
current behavior is that the CLI appears to just freeze leaving the user with 
no idea what's going on.

> Add conf. property hive.exec.show.job.failure.debug.info to enable/disable 
> displaying link to the task with most failures
> -
>
> Key: HIVE-1578
> URL: https://issues.apache.org/jira/browse/HIVE-1578
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Paul Yang
>Assignee: Paul Yang
> Fix For: 0.7.0
>
> Attachments: HIVE-1578.1.patch
>
>
> If a job fails, Hive currently displays a link to the task with the most 
> number of failures for easy access to the error logs. However, generating the 
> link may require many RPC's to get all the task completion events, adding a 
> delay of up to 30 minutes. This patch adds a configuration variable to 
> control whether the link is generated. Turning off this feature would also 
> disable automatic debugging tips generated by heuristics reading from the 
> error logs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1594) Typo of hive.merge.size.smallfiles.avgsize prevents change of value

2010-08-26 Thread Yun Huang Yong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Huang Yong updated HIVE-1594:
-

Attachment: HIVE-1594.patch
HIVE-1594-0.5.patch

HIVE-1594.patch applies cleanly to 0.6 and trunk.

Separate patch for branch 0.5 as it was missing the smallfiles.avgsize setting 
altogether.

> Typo of hive.merge.size.smallfiles.avgsize prevents change of value
> ---
>
> Key: HIVE-1594
> URL: https://issues.apache.org/jira/browse/HIVE-1594
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Configuration
>Affects Versions: 0.5.0, 0.6.0, 0.7.0
>Reporter: Yun Huang Yong
>Priority: Minor
> Attachments: HIVE-1594-0.5.patch, HIVE-1594.patch
>
>
> The setting is described as hive.merge.size.smallfiles.avgsize, 
> however common/src/java/org/apache/hadoop/hive/conf/HiveConf.java reads it as 
> "hive.merge.smallfiles.avgsize" (note the missing '.size.') so the user's 
> setting has no effect and the value is stuck at the default of 16MB.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: [VOTE] Draft Resolution to make Hive a TLP

2010-08-26 Thread Paul Yang
+1

-Original Message-
From: Zheng Shao [mailto:zsh...@gmail.com] 
Sent: Wednesday, August 25, 2010 10:03 PM
To: hive-dev@hadoop.apache.org
Subject: Re: [VOTE] Draft Resolution to make Hive a TLP

+1

On Wed, Aug 25, 2010 at 2:30 PM, yongqiang he  wrote:
> +1
>
> Yongqiang
>
> On Wed, Aug 25, 2010 at 11:21 AM, John Sichi  wrote:
>> +1
>>
>> JVS
>>
>> On Aug 25, 2010, at 8:06 AM, Edward Capriolo wrote:
>>
>>> +1
>>>
>>> On Wed, Aug 25, 2010 at 1:43 AM, Namit Jain  wrote:
 +1

 
 From: Ning Zhang [nzh...@facebook.com]
 Sent: Tuesday, August 24, 2010 9:18 PM
 To: 
 Subject: Re: [VOTE] Draft Resolution to make Hive a TLP

 +1

 On Aug 24, 2010, at 8:50 PM, Carl Steinbach wrote:

> +1
>
> On Tue, Aug 24, 2010 at 6:56 PM, Ashish Thusoo  
> wrote:
>
>> Folks,
>>
>> I am going to make the following proposal at gene...@hadoop.apache.org
>>
>> In summary this proposal does the following things:
>>
>> 1. Establishes the PMC as comprising of the current committers of Hive 
>> (as
>> of today - 8/24/2010).
>>
>> 2. Proposes Namit Jain to the chair of the project (PMC chairs have no 
>> more
>> power than other PMC members, but they are responsible for writing 
>> regular
>> reports for the Apache board, assigning rights to new committers, etc.)
>>
>> 3. Tasks the PMC to come up with the bylaws for governance of the 
>> project.
>>
>> Please vote on this as soon as possible(yes I should have done this as 
>> part
>> of the earlier vote, but please bear with me), so that we can get the 
>> ball
>> rolling on this...
>>
>> Thanks,
>> Ashish
>>
>> Draft Resolution to be sent to the Apache Board
>> ---
>>
>> Establish the Apache Hive Project
>>
>>        WHEREAS, the Board of Directors deems it to be in the best
>>        interests of the Foundation and consistent with the
>>        Foundation's purpose to establish a Project Management
>>        Committee charged with the creation and maintenance of
>>        open-source software related to parallel analysis of large
>>        data sets for distribution at no charge to the public.
>>
>>        NOW, THEREFORE, BE IT RESOLVED, that a Project Management
>>        Committee (PMC), to be known as the "Apache Hive Project",
>>        be and hereby is established pursuant to Bylaws of the
>>        Foundation; and be it further
>>
>>        RESOLVED, that the Apache Hive Project be and hereby is
>>        responsible for the creation and maintenance of software
>>        related to parallel analysis of large data sets; and be
>>        it further
>>
>>        RESOLVED, that the office of "Vice President, Apache Hive" be
>>        and hereby is created, the person holding such office to
>>        serve at the direction of the Board of Directors as the chair
>>        of the Apache Hive Project, and to have primary responsibility
>>        for management of the projects within the scope of
>>        responsibility of the Apache Hive Project; and be it further
>>
>>        RESOLVED, that the persons listed immediately below be and
>>        hereby are appointed to serve as the initial members of the
>>        Apache Hive Project:
>>            * Namit Jain (na...@apache.org)
>>            * John Sichi (j...@apache.org)
>>            * Zheng Shao (zs...@apache.org)
>>            * Edward Capriolo (appodic...@apache.org)
>>            * Raghotham Murthy (r...@apache.org)
>>            * Ning Zhang (nzh...@apache.org)
>>            * Paul Yang (pa...@apache.org)
>>            * He Yongqiang (he yongqi...@apache.org)
>>            * Prasad Chakka (pras...@apache.org)
>>            * Joydeep Sen Sarma (jsensa...@apache.org)
>>            * Ashish Thusoo (athu...@apache.org)
>>
>>        NOW, THEREFORE, BE IT FURTHER RESOLVED, that Namit Jain
>>        be appointed to the office of Vice President, Apache Hive, to
>>        serve in accordance with and subject to the direction of the
>>        Board of Directors and the Bylaws of the Foundation until
>>        death, resignation, retirement, removal or disqualification,
>>        or until a successor is appointed; and be it further
>>
>>        RESOLVED, that the initial Apache Hive PMC be and hereby is
>>        tasked with the creation of a set of bylaws intended to
>>        encourage open development and increased participation in the
>>        Apache Hive Project; and be it further
>>
>>        RESOLVED, that the Apache Hive Project be and hereby
>>        is tasked with the migration and rationalization of the Apache
>>        Hadoop H