[jira] Updated: (HIVE-1345) TypedBytesSerDe fails to create table with multiple columns.

2010-05-14 Thread Arvind Prabhakar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arvind Prabhakar updated HIVE-1345:
---

Status: Patch Available  (was: Open)

The problem was due to incorrect parsing of the {{columnTypeProperty}} during 
the initialization of {{TypedBytesSerDe}}. This patch fixes the problem by 
delegating the parsing logic to the standard routine used by other SerDes - 
{{TypeInfoUtils.getTypeInfosFromTypeString()}}.

Also included in this patch is a test case that exercises this change and 
validates that multi-column tables can be created when using this SerDe.

> TypedBytesSerDe fails to create table with multiple columns.
> 
>
> Key: HIVE-1345
> URL: https://issues.apache.org/jira/browse/HIVE-1345
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Contrib
>Affects Versions: 0.5.0
> Environment: JDK 6 (1.6.0_17) on Mac OSX 10.6.3, Hadoop 0.20.2, Hive 
> 0.5.0
>Reporter: Arvind Prabhakar
>Assignee: Arvind Prabhakar
> Fix For: 0.6.0
>
> Attachments: HIVE-1345-1.patch
>
>
> Creating a table with more than one columns fails when the row format SerDe 
> is TypedBytesSerDe. 
> {code}
> hive> CREATE TABLE test (a STRING, b STRING) ROW FORMAT SERDE 
> 'org.apache.hadoop.hive.contrib.serde2.TypedBytesSerDe';  
> Found class for org.apache.hadoop.hive.contrib.serde2.TypedBytesSerDe 
>   
> FAILED: Error in metadata: java.lang.IndexOutOfBoundsException: Index: 1, 
> Size: 1   
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask
>   
> hive> 
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1345) TypedBytesSerDe fails to create table with multiple columns.

2010-05-14 Thread Arvind Prabhakar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arvind Prabhakar updated HIVE-1345:
---

Attachment: HIVE-1345-1.patch

> TypedBytesSerDe fails to create table with multiple columns.
> 
>
> Key: HIVE-1345
> URL: https://issues.apache.org/jira/browse/HIVE-1345
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Contrib
>Affects Versions: 0.5.0
> Environment: JDK 6 (1.6.0_17) on Mac OSX 10.6.3, Hadoop 0.20.2, Hive 
> 0.5.0
>Reporter: Arvind Prabhakar
>Assignee: Arvind Prabhakar
> Fix For: 0.6.0
>
> Attachments: HIVE-1345-1.patch
>
>
> Creating a table with more than one columns fails when the row format SerDe 
> is TypedBytesSerDe. 
> {code}
> hive> CREATE TABLE test (a STRING, b STRING) ROW FORMAT SERDE 
> 'org.apache.hadoop.hive.contrib.serde2.TypedBytesSerDe';  
> Found class for org.apache.hadoop.hive.contrib.serde2.TypedBytesSerDe 
>   
> FAILED: Error in metadata: java.lang.IndexOutOfBoundsException: Index: 1, 
> Size: 1   
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask
>   
> hive> 
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1345) TypedBytesSerDe fails to create table with multiple columns.

2010-05-14 Thread Arvind Prabhakar (JIRA)
TypedBytesSerDe fails to create table with multiple columns.


 Key: HIVE-1345
 URL: https://issues.apache.org/jira/browse/HIVE-1345
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Contrib
Affects Versions: 0.5.0
 Environment: JDK 6 (1.6.0_17) on Mac OSX 10.6.3, Hadoop 0.20.2, Hive 
0.5.0
Reporter: Arvind Prabhakar
Assignee: Arvind Prabhakar
 Fix For: 0.6.0


Creating a table with more than one columns fails when the row format SerDe is 
TypedBytesSerDe. 


{code}
hive> CREATE TABLE test (a STRING, b STRING) ROW FORMAT SERDE 
'org.apache.hadoop.hive.contrib.serde2.TypedBytesSerDe';  
Found class for org.apache.hadoop.hive.contrib.serde2.TypedBytesSerDe   

FAILED: Error in metadata: java.lang.IndexOutOfBoundsException: Index: 1, Size: 
1   
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.DDLTask  
hive> 
{code}



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-80) Allow Hive Server to run multiple queries simulteneously

2010-05-14 Thread Arvind Prabhakar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-80?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12867693#action_12867693
 ] 

Arvind Prabhakar commented on HIVE-80:
--

I wanted to fix this JIRA and so started looking at it. From what I have 
observed it appears that the {{HiveServer}} *is* multi-thread capable. 
Specifically:

* The {{HiveServer}} is using a {{TThreadPoolServer}} which is multi-threaded.
* The {{ThriftHiveProcessorFactory}} overrides the {{getProcessor()}} call and 
returns a new instance of {{HiveServerHandler}} on every invokation.
* Every instance of {{HiveServerHandler}} has its own thread local session 
state and a private driver instance.
* Query execution is thread safe thanks to HIVE-77.

Give the above, I believe that this JIRA should be marked closed and resolved. 
If you think I missed something in my analysis, can you please point that out?

> Allow Hive Server to run multiple queries simulteneously
> 
>
> Key: HIVE-80
> URL: https://issues.apache.org/jira/browse/HIVE-80
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Server Infrastructure
>Reporter: Raghotham Murthy
>Assignee: Neil Conway
>Priority: Critical
> Attachments: hive_input_format_race-2.patch
>
>
> Can use one driver object per query.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Build failed in Hudson: Hive-trunk-h0.19 #441

2010-05-14 Thread Apache Hudson Server
See 

--
[...truncated 13969 lines...]
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table src
[junit] POSTHOOK: Output: defa...@src
[junit] OK
[junit] Loading data to table src1
[junit] POSTHOOK: Output: defa...@src1
[junit] OK
[junit] Loading data to table src_sequencefile
[junit] POSTHOOK: Output: defa...@src_sequencefile
[junit] OK
[junit] Loading data to table src_thrift
[junit] POSTHOOK: Output: defa...@src_thrift
[junit] OK
[junit] Loading data to table src_json
[junit] POSTHOOK: Output: defa...@src_json
[junit] OK
[junit] diff 

 

[junit] Done query: unknown_function4.q
[junit] Begin query: unknown_table1.q
[junit] Loading data to table srcpart partition (ds=2008-04-08, hr=11)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=11
[junit] OK
[junit] Loading data to table srcpart partition (ds=2008-04-08, hr=12)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=12
[junit] OK
[junit] Loading data to table srcpart partition (ds=2008-04-09, hr=11)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=11
[junit] OK
[junit] Loading data to table srcpart partition (ds=2008-04-09, hr=12)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=12
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table src
[junit] POSTHOOK: Output: defa...@src
[junit] OK
[junit] Loading data to table src1
[junit] POSTHOOK: Output: defa...@src1
[junit] OK
[junit] Loading data to table src_sequencefile
[junit] POSTHOOK: Output: defa...@src_sequencefile
[junit] OK
[junit] Loading data to table src_thrift
[junit] POSTHOOK: Output: defa...@src_thrift
[junit] OK
[junit] Loading data to table src_json
[junit] POSTHOOK: Output: defa...@src_json
[junit] OK
[junit] diff 

 

[junit] Done query: unknown_table1.q
[junit] Begin query: unknown_table2.q
[junit] Loading data to table srcpart partition (ds=2008-04-08, hr=11)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=11
[junit] OK
[junit] Loading data to table srcpart partition (ds=2008-04-08, hr=12)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=12
[junit] OK
[junit] Loading data to table srcpart partition (ds=2008-04-09, hr=11)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=11
[junit] OK
[junit] Loading data to table srcpart partition (ds=2008-04-09, hr=12)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=12
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table src
[junit] POSTHOOK: Out

[jira] Commented: (HIVE-1343) add an interface in RCFile to support concatenation of two files without (de)compression

2010-05-14 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12867557#action_12867557
 ] 

Ning Zhang commented on HIVE-1343:
--

Yongqiang this patch only exposes the FileInputReader to the client and the 
client has to merge the file locally. This won't be scalable. What we should do 
is to run this merge job as a map-only job so that it can be run in parallel.

Talked with Dhruba and he think it would be possible to make it a map-only job. 
The idea is to define a new RecordReader that does not do decompression and 
iterate over records. Instead it iterates over uncompressed blocks. 

> add an interface in RCFile to support concatenation of two files without 
> (de)compression
> 
>
> Key: HIVE-1343
> URL: https://issues.apache.org/jira/browse/HIVE-1343
> Project: Hadoop Hive
>  Issue Type: New Feature
>Affects Versions: 0.6.0
>Reporter: Ning Zhang
>Assignee: He Yongqiang
> Fix For: 0.6.0
>
> Attachments: HIVE-1343.1.patch
>
>
> If two files are concatenated, we need to read each record in these files and 
> write them back to the destination file. The IO cost is mostly unavoidable 
> due to the lack of append functionality in HDFS. However the CPU cost could 
> be significantly reduced by avoiding compression and decompression of the 
> files.
> The File Format layer should provide API that implement the block-level 
> concatenation. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Hudson build is back to normal : Hive-trunk-h0.18 #442

2010-05-14 Thread Apache Hudson Server
See 




[jira] Commented: (HIVE-1335) DataNucleus should use connection pooling

2010-05-14 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12867538#action_12867538
 ] 

Edward Capriolo commented on HIVE-1335:
---

Just a strange side-note. Why is the classpath specified in both build.xml and 
build-common.xml. Do we need it defined in both places?

> DataNucleus should use connection pooling
> -
>
> Key: HIVE-1335
> URL: https://issues.apache.org/jira/browse/HIVE-1335
> Project: Hadoop Hive
>  Issue Type: Improvement
>Affects Versions: 0.5.0
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Fix For: 0.6.0
>
> Attachments: commons-dbcp-1.2.2.jar, commons-dbcp.LICENSE, 
> commons-pool-1.2.jar, commons-pool.LICENSE, 
> datanucleus-connectionpool-1.0.2.jar, datanucleus-connectionpool.LICENSE, 
> hive-1335-1.patch.txt, hive-1335.patch.txt
>
>
> Currently each Data Nucleus operation disconnects and reconnects to the 
> MetaStore over jdbc. Queries fail to even explain properly in cases where a 
> table has many partitions. This is fixed by enabling one parameter and 
> including several jars.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1335) DataNucleus should use connection pooling

2010-05-14 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1335:
--

Attachment: hive-1335-1.patch.txt

> DataNucleus should use connection pooling
> -
>
> Key: HIVE-1335
> URL: https://issues.apache.org/jira/browse/HIVE-1335
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Fix For: 0.6.0
>
> Attachments: commons-dbcp-1.2.2.jar, commons-dbcp.LICENSE, 
> commons-pool-1.2.jar, commons-pool.LICENSE, 
> datanucleus-connectionpool-1.0.2.jar, datanucleus-connectionpool.LICENSE, 
> hive-1335-1.patch.txt, hive-1335.patch.txt
>
>
> Currently each Data Nucleus operation disconnects and reconnects to the 
> MetaStore over jdbc. Queries fail to even explain properly in cases where a 
> table has many partitions. This is fixed by enabling one parameter and 
> including several jars.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1335) DataNucleus should use connection pooling

2010-05-14 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1335:
--

   Status: Patch Available  (was: Open)
Affects Version/s: 0.5.0

> DataNucleus should use connection pooling
> -
>
> Key: HIVE-1335
> URL: https://issues.apache.org/jira/browse/HIVE-1335
> Project: Hadoop Hive
>  Issue Type: Improvement
>Affects Versions: 0.5.0
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Fix For: 0.6.0
>
> Attachments: commons-dbcp-1.2.2.jar, commons-dbcp.LICENSE, 
> commons-pool-1.2.jar, commons-pool.LICENSE, 
> datanucleus-connectionpool-1.0.2.jar, datanucleus-connectionpool.LICENSE, 
> hive-1335-1.patch.txt, hive-1335.patch.txt
>
>
> Currently each Data Nucleus operation disconnects and reconnects to the 
> MetaStore over jdbc. Queries fail to even explain properly in cases where a 
> table has many partitions. This is fixed by enabling one parameter and 
> including several jars.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1343) add an interface in RCFile to support concatenation of two files without (de)compression

2010-05-14 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1343:
---

Attachment: HIVE-1343.1.patch

> add an interface in RCFile to support concatenation of two files without 
> (de)compression
> 
>
> Key: HIVE-1343
> URL: https://issues.apache.org/jira/browse/HIVE-1343
> Project: Hadoop Hive
>  Issue Type: New Feature
>Affects Versions: 0.6.0
>Reporter: Ning Zhang
>Assignee: He Yongqiang
> Fix For: 0.6.0
>
> Attachments: HIVE-1343.1.patch
>
>
> If two files are concatenated, we need to read each record in these files and 
> write them back to the destination file. The IO cost is mostly unavoidable 
> due to the lack of append functionality in HDFS. However the CPU cost could 
> be significantly reduced by avoiding compression and decompression of the 
> files.
> The File Format layer should provide API that implement the block-level 
> concatenation. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.