date:20100617

[jira] Commented: (HIVE-1405) Implement a .hiverc startup file

2010-06-17 Thread John Sichi (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12880061#action_12880061
 ] 

John Sichi commented on HIVE-1405:
--

Oops, you are right Edward...I only skimmed the doc; ! is for shell commands, 
not SQL scripts.  Sorry for the confusion.

We can add source for SQL script invocation.


> Implement a .hiverc startup file
> 
>
> Key: HIVE-1405
> URL: https://issues.apache.org/jira/browse/HIVE-1405
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Jonathan Chang
>Assignee: John Sichi
>
> When deploying hive, it would be nice to have a .hiverc file containing 
> statements that would be automatically run whenever hive is launched.  This 
> way, we can automatically add JARs, create temporary functions, set flags, 
> etc. for all users quickly. 
> This should ideally be set up like .bashrc and the like with a global version 
> and a user-local version.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1383) allow HBase WAL to be disabled

2010-06-17 Thread Ning Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1383:
-

Status: Resolved  (was: Patch Available)
Resolution: Fixed

Committed. Thanks John!

> allow HBase WAL to be disabled
> --
>
> Key: HIVE-1383
> URL: https://issues.apache.org/jira/browse/HIVE-1383
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: HBase Handler
>Affects Versions: 0.6.0
>Reporter: John Sichi
>Assignee: John Sichi
> Fix For: 0.6.0
>
> Attachments: HIVE-1383.1.patch, HIVE-1383.2.patch, HIVE-1383.3.patch, 
> HIVE-1383.4.patch
>
>
> Disabling WAL can lead to much better INSERT performance in cases where other 
> means of safe recovery (such as bulk import) are available.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HIVE-1305) add progress in join and groupby

2010-06-17 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain reassigned HIVE-1305:


Assignee: Siying Dong  (was: Paul Yang)

> add progress in join and groupby
> 
>
> Key: HIVE-1305
> URL: https://issues.apache.org/jira/browse/HIVE-1305
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Siying Dong
>
> The operators join and groupby can consume a lot of rows before producing any 
> output. 
> All operators which do not have a output for every input should report 
> progress periodically.
> Currently, it is only being done for ScriptOperator and FilterOperator.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

RE: Hive-Hbase integration problem, ask for help

2010-06-17 Thread Zhou Shuaifeng

 I solved this problem by deleting all the jars previously existed and
rebuilding the hive source. 
But still don't know what's the reason  of the previous problem.

Zhou

-Original Message-
From: John Sichi [mailto:jsi...@facebook.com] 
Sent: Friday, June 18, 2010 4:12 AM
To: hive-dev@hadoop.apache.org
Cc: zhaozhifeng 00129982; zhoushuaif...@huawei.com
Subject: Re: Hive-Hbase integration problem, ask for help

I've added this on as extra validation which ought to be added in HIVE-1222.

JVS

On Jun 15, 2010, at 3:59 PM, Basab Maulik wrote:

> I was not able to reproduce this problem on trunk (can't remember the 
> label). The funny thing was both the create table and the insert 
> overwrite worked even though the create table contained the invalid row
format spec.
> 
> Basab
> 
> On Fri, Jun 11, 2010 at 1:33 PM, John Sichi  wrote:
> 
>> You should not be specifying any ROW FORMAT for an HBase table.
>> 
>> From the log in your earlier post, I couldn't tell what was going 
>> wrong; I don' think it contained the full exception stacks.  You 
>> might be able to dig around in the actual log files to find more.
>> 
>> JVS
>> 
>> From: Zhou Shuaifeng [zhoushuaif...@huawei.com]
>> Sent: Thursday, June 10, 2010 7:26 PM
>> To: hive-dev@hadoop.apache.org
>> Cc: 'zhaozhifeng 00129982'
>> Subject: Hive-Hbase integration problem, ask for help
>> 
>> Hi Guys,
>> 
>> I download the hive source from SVN server, build it and try to run 
>> the hive-hbase integration.
>> 
>> I works well on all file-based hive tables, but on the hbase-based 
>> tables, the 'insert' command cann't run successful. The 'select' 
>> command can run well.
>> 
>> error info is below:
>> 
>> hive> INSERT OVERWRITE TABLE hive_zsf SELECT * FROM zsf WHERE id=3;
>> Total MapReduce jobs = 1
>> Launching Job 1 out of 1
>> Number of reduce tasks is set to 0 since there's no reduce operator 
>> Starting Job = job_201006081948_0021, Tracking URL =
>> http://linux-01:50030/jobdetails.jsp?jobid=job_201006081948_0021
>> Kill Command = /opt/hadoop/hdfs/bin/../bin/hadoop job
>> -Dmapred.job.tracker=linux-01:9001 -kill job_201006081948_0021
>> 2010-06-09 16:05:43,898 Stage-0 map = 0%,  reduce = 0%
>> 2010-06-09 16:06:12,131 Stage-0 map = 100%,  reduce = 100% Ended Job 
>> = job_201006081948_0021 with errors
>> 
>> Task with the most failures(4):
>> -
>> Task ID:
>> task_201006081948_0021_m_00
>> 
>> URL:
>> http://linux-01:50030/taskdetails.jsp?jobid=job_201006081948_0021
>> <
>> http://linux-01:50030/taskdetails.jsp?jobid=job_201006081948_0021&tip
>> id=tas 
>> k_201006081948_0021_m_00> bid=job_201006081948_0021&tipid=tas%0Ak_201006081948_0021_m_00>>
>> &tipid=task_201006081948_0021_m_00
>> -
>> 
>> FAILED: Execution Error, return code 2 from 
>> org.apache.hadoop.hive.ql.exec.ExecDriver
>> 
>> 
>> 
>> 
>> I create a hbase-based table with hive, put some data into the hbase 
>> table through the hbase shell, and can select data from it through hive:
>> 
>> CREATE TABLE hive_zsf1(id int, name string) ROW FORMAT DELIMITED 
>> FIELDS TERMINATED BY '\t'
>> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>> WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val") 
>> TBLPROPERTIES ("hbase.table.name" = "hive_zsf1");
>> 
>> hbase(main):001:0> scan 'hive_zsf1'
>> ROW  COLUMN+CELL
>> 
>> 1   column=cf1:val, timestamp=1276157509028,
>> value=zsf
>> 2   column=cf1:val, timestamp=1276157539051,
>> value=zzf
>> 3   column=cf1:val, timestamp=1276157548247,
>> value=zw
>> 4   column=cf1:val, timestamp=1276157557115,
>> value=cjl
>> 4 row(s) in 0.0470 seconds
>> hbase(main):002:0>
>> 
>> hive> select * from hive_zsf1 where id=3;
>> Total MapReduce jobs = 1
>> Launching Job 1 out of 1
>> Number of reduce tasks is set to 0 since there's no reduce operator 
>> Starting Job = job_201006081948_0038, Tracking URL =
>> http://linux-01:50030/jobdetails.jsp?jobid=job_201006081948_0038
>> Kill Command = /opt/hadoop/hdfs/bin/../bin/hadoop job
>> -Dmapred.job.tracker=linux-01:9001 -kill job_201006081948_0038
>> 2010-06-11 10:25:42,049 Stage-1 map = 0%,  reduce = 0%
>> 2010-06-11 10:25:45,090 Stage-1 map = 100%,  reduce = 0%
>> 2010-06-11 10:25:48,133 Stage-1 map = 100%,  reduce = 100% Ended Job 
>> = job_201006081948_0038 OK
>> 3   zw
>> Time taken: 13.526 seconds
>> hive>
>> 
>> 
>> 
>> 
>> 
>> -
>> ---
>> -
>> This e-mail and its attachments contain confidential information from 
>> HUAWEI, which is intended only for the person or entity whose address 
>> is listed above.
>> Any
>> use of the
>> information contained herein in any way (including, but not limited 
>> to, total or partial disclosure, repro

[jira] Assigned: (HIVE-1374) Query compile-only option

2010-06-17 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain reassigned HIVE-1374:


Assignee: Siying Dong  (was: Paul Yang)

> Query compile-only option
> -
>
> Key: HIVE-1374
> URL: https://issues.apache.org/jira/browse/HIVE-1374
> Project: Hadoop Hive
>  Issue Type: New Feature
>Affects Versions: 0.6.0
>Reporter: Paul Yang
>Assignee: Siying Dong
>
> A compile-only option might be useful for helping users quickly prototype 
> queries, fix errors, and do test runs. The proposed change would be adding a 
> -c switch that behaves like -e but only compiles the specified query.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HIVE-1413) bring a table/partition offline

2010-06-17 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain reassigned HIVE-1413:


Assignee: Siying Dong  (was: Paul Yang)

> bring a table/partition offline
> ---
>
> Key: HIVE-1413
> URL: https://issues.apache.org/jira/browse/HIVE-1413
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Metastore, Query Processor
>Reporter: Namit Jain
>Assignee: Siying Dong
> Fix For: 0.6.0
>
>
> There should be a way to bring a table/partition offline.
> At that time, no read/write operations should be supported on that table.
> It would be very useful for housekeeping operations

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HIVE-1385) UDF field() doesn't work

2010-06-17 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain reassigned HIVE-1385:


Assignee: Siying Dong

> UDF field() doesn't work
> 
>
> Key: HIVE-1385
> URL: https://issues.apache.org/jira/browse/HIVE-1385
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.5.0
>Reporter: Siying Dong
>Assignee: Siying Dong
>Priority: Minor
>
> I tried it against one of my table:
> hive> desc r;
> OK
> key int
> value string
> a string
> hive> select * from r;
> OK
> 4 val_356 NULL
> 4 val_356 NULL
> 484 val_169 NULL
> 484 val_169 NULL
> 2000 val_169 NULL
> 2000 val_169 NULL
> 3000 val_169 NULL
> 3000 val_169 NULL
> 4000 val_125 NULL
> 4000 val_125 NULL
> hive> select *, field(value, 'val_169') from r; 
> OK
> 4 val_356 NULL 0
> 4 val_356 NULL 0
> 484 val_169 NULL 0
> 484 val_169 NULL 0
> 2000 val_169 NULL 0
> 2000 val_169 NULL 0
> 3000 val_169 NULL 0
> 3000 val_169 NULL 0
> 4000 val_125 NULL 0
> 4000 val_125 NULL 0

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1405) Implement a .hiverc startup file

2010-06-17 Thread Edward Capriolo (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12880053#action_12880053
 ] 

Edward Capriolo commented on HIVE-1405:
---

{noformat}
[edw...@ec dist]$ echo show tables > a.sql
[edw...@ec dist]$ bin/hive
[edw...@ec dist]$ chmod a+x a.sql 
[edw...@ec dist]$ bin/hive
Hive history file=/tmp/edward/hive_job_log_edward_201006172223_1189860304.txt
[edw...@ec dist]$ pwd
/mnt/data/hive/hive/build/dist
[edw...@ec dist]$ bin/hive
Hive history file=/tmp/edward/hive_job_log_edward_201006172223_310534855.txt
hive> ! /mnt/data/hive/hive/build/dist/a.sql;
/mnt/data/hive/hive/build/dist/a.sql: line 1: show: command not found
Command failed with exit code = 127
{noformat}

! seems to execute bash commands

Dont we want to execute hive commands inside hive like add jar


> Implement a .hiverc startup file
> 
>
> Key: HIVE-1405
> URL: https://issues.apache.org/jira/browse/HIVE-1405
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Jonathan Chang
>Assignee: John Sichi
>
> When deploying hive, it would be nice to have a .hiverc file containing 
> statements that would be automatically run whenever hive is launched.  This 
> way, we can automatically add JARs, create temporary functions, set flags, 
> etc. for all users quickly. 
> This should ideally be set up like .bashrc and the like with a global version 
> and a user-local version.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1135) Use Anakia for version controlled documentation

2010-06-17 Thread Edward Capriolo (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1135:
--

Attachment: hive-1135-5-patch.txt

Added the join page as well.

> Use Anakia for version controlled documentation
> ---
>
> Key: HIVE-1135
> URL: https://issues.apache.org/jira/browse/HIVE-1135
> Project: Hadoop Hive
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 0.5.0
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Fix For: 0.6.0
>
> Attachments: hive-1135-3-patch.txt, hive-1135-4-patch.txt, 
> hive-1135-5-patch.txt, hive-1335-1.patch.txt, hive-1335-2.patch.txt, 
> jdom-1.1.jar, jdom-1.1.LICENSE, wtf.png
>
>
> Currently the Hive Language Manual and many other critical pieces of 
> documentation are on the Hive wiki. 
> Right now we count on the author of a patch to follow up and add wiki 
> entries. While we do a decent job with this, new features can be missed. Or 
> using running older/newer branches can not locate relevant documentation for 
> their branch. 
> ..example of a perception I do not think we want to give off...
> http://dev.hubspot.com/bid/30170/Who-Loves-the-Magic-Undocumented-Hive-Mapjoin-This-Guy
> We should generate our documentation in the way hadoop & hbase does, inline 
> using forest. I would like to take the lead on this, but we need a lot of 
> consensus on doing this properly. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1405) Implement a .hiverc startup file

2010-06-17 Thread Carl Steinbach (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12880041#action_12880041
 ] 

Carl Steinbach commented on HIVE-1405:
--

* Can someone please add a "CLI" component to JIRA?
* I think it would be good to add "source" as a synonym for "!". MySQL uses 
"source " and "\. ". See 
http://dev.mysql.com/doc/refman/5.0/en/batch-commands.html



> Implement a .hiverc startup file
> 
>
> Key: HIVE-1405
> URL: https://issues.apache.org/jira/browse/HIVE-1405
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Jonathan Chang
>Assignee: John Sichi
>
> When deploying hive, it would be nice to have a .hiverc file containing 
> statements that would be automatically run whenever hive is launched.  This 
> way, we can automatically add JARs, create temporary functions, set flags, 
> etc. for all users quickly. 
> This should ideally be set up like .bashrc and the like with a global version 
> and a user-local version.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1405) Implement a .hiverc startup file

2010-06-17 Thread John Sichi (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12880033#action_12880033
 ] 

John Sichi commented on HIVE-1405:
--

The new -i will be the equivalent of bash --init-file.


> Implement a .hiverc startup file
> 
>
> Key: HIVE-1405
> URL: https://issues.apache.org/jira/browse/HIVE-1405
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Jonathan Chang
>Assignee: John Sichi
>
> When deploying hive, it would be nice to have a .hiverc file containing 
> statements that would be automatically run whenever hive is launched.  This 
> way, we can automatically add JARs, create temporary functions, set flags, 
> etc. for all users quickly. 
> This should ideally be set up like .bashrc and the like with a global version 
> and a user-local version.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1405) Implement a .hiverc startup file

2010-06-17 Thread John Sichi (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12880032#action_12880032
 ] 

John Sichi commented on HIVE-1405:
--

There is already an equivalent of the source command (exclamation mark)

! test.sql

So that can be used to do the chaining if desired.


> Implement a .hiverc startup file
> 
>
> Key: HIVE-1405
> URL: https://issues.apache.org/jira/browse/HIVE-1405
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Jonathan Chang
>Assignee: John Sichi
>
> When deploying hive, it would be nice to have a .hiverc file containing 
> statements that would be automatically run whenever hive is launched.  This 
> way, we can automatically add JARs, create temporary functions, set flags, 
> etc. for all users quickly. 
> This should ideally be set up like .bashrc and the like with a global version 
> and a user-local version.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1405) Implement a .hiverc startup file

2010-06-17 Thread Edward Capriolo (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12880028#action_12880028
 ] 

Edward Capriolo commented on HIVE-1405:
---

I like Carl's approach. The entire point of the hiverc is not to explicitly 
have to do invoke anything explicit to add jars.

> Implement a .hiverc startup file
> 
>
> Key: HIVE-1405
> URL: https://issues.apache.org/jira/browse/HIVE-1405
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Jonathan Chang
>Assignee: John Sichi
>
> When deploying hive, it would be nice to have a .hiverc file containing 
> statements that would be automatically run whenever hive is launched.  This 
> way, we can automatically add JARs, create temporary functions, set flags, 
> etc. for all users quickly. 
> This should ideally be set up like .bashrc and the like with a global version 
> and a user-local version.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1405) Implement a .hiverc startup file

2010-06-17 Thread Carl Steinbach (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12880015#action_12880015
 ] 

Carl Steinbach commented on HIVE-1405:
--

An alternative approach is to add a "source " command to the 
CLIDriver, and to have it load ~/.hiverc first if it is present. A user's 
.hiverc can in turn source a global .hiverc.

> Implement a .hiverc startup file
> 
>
> Key: HIVE-1405
> URL: https://issues.apache.org/jira/browse/HIVE-1405
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Jonathan Chang
>Assignee: John Sichi
>
> When deploying hive, it would be nice to have a .hiverc file containing 
> statements that would be automatically run whenever hive is launched.  This 
> way, we can automatically add JARs, create temporary functions, set flags, 
> etc. for all users quickly. 
> This should ideally be set up like .bashrc and the like with a global version 
> and a user-local version.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HIVE-1405) Implement a .hiverc startup file

2010-06-17 Thread John Sichi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi reassigned HIVE-1405:


Assignee: John Sichi

> Implement a .hiverc startup file
> 
>
> Key: HIVE-1405
> URL: https://issues.apache.org/jira/browse/HIVE-1405
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Jonathan Chang
>Assignee: John Sichi
>
> When deploying hive, it would be nice to have a .hiverc file containing 
> statements that would be automatically run whenever hive is launched.  This 
> way, we can automatically add JARs, create temporary functions, set flags, 
> etc. for all users quickly. 
> This should ideally be set up like .bashrc and the like with a global version 
> and a user-local version.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1405) Implement a .hiverc startup file

2010-06-17 Thread John Sichi (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12880002#action_12880002
 ] 

John Sichi commented on HIVE-1405:
--

I'm thinking of addressing the requirement by adding a -i option to explicitly 
specify an initialization script.  This could be used multiple times to invoke 
multiple initialization scripts in the order they appear on the hive command 
line, e.g.

hive -i /everyone/hiveinit.sql -i /home/jvs/myhiveinit.sql -f test.sql

The new -i option would be compatible with -f, -e, and with interactive console 
(no -f/-e).  The -i scripts would always be run before the -e/-f.

This allows wrapper scripts to build up the global/local .bashrc functionality 
in installation-specific ways; I don't think we want to dictate that within 
Hive itself.

Would this be good enough?


> Implement a .hiverc startup file
> 
>
> Key: HIVE-1405
> URL: https://issues.apache.org/jira/browse/HIVE-1405
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Jonathan Chang
>
> When deploying hive, it would be nice to have a .hiverc file containing 
> statements that would be automatically run whenever hive is launched.  This 
> way, we can automatically add JARs, create temporary functions, set flags, 
> etc. for all users quickly. 
> This should ideally be set up like .bashrc and the like with a global version 
> and a user-local version.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-287) count distinct on multiple columns does not work

2010-06-17 Thread Arvind Prabhakar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1288#action_1288
 ] 

Arvind Prabhakar commented on HIVE-287:
---

@John: I agree with your assessment above. Regarding the count(*), my earlier 
comment was not to imply that there exists a UDAF today, but that it might 
exist in the future. More importantly though, using an empty parameter list as 
an indicator for * would blur the distinction between UDAF(*) vs UDAF() 
invocation. This is one way of many perhaps where parameter overloading could 
lead to confusion and hard to understand code. 

I think introducing {{GenericUDAFResolver2}} interface is a great idea. I also 
like the idea of using a call back for decoupling the invocation from parameter 
list but am concerned that this could lead to perhaps redundant method call and 
object creation. I am not sure if that would add to any significant performance 
penalty in the long run or not. 

I would love to know what the opinion of others interested in this issue is 
regarding this route. If all agree that adding a new interface with callback 
for parameter discovery is acceptable, I can start working on that patch.

> count distinct on multiple columns does not work
> 
>
> Key: HIVE-287
> URL: https://issues.apache.org/jira/browse/HIVE-287
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Arvind Prabhakar
> Attachments: HIVE-287-1.patch, HIVE-287-2.patch, HIVE-287-3.patch
>
>
> The following query does not work:
> select count(distinct col1, col2) from Tbl

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1412) CombineHiveInputFormat bug on tablesample

2010-06-17 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1412:
-

Status: Open  (was: Patch Available)

> CombineHiveInputFormat bug on tablesample
> -
>
> Key: HIVE-1412
> URL: https://issues.apache.org/jira/browse/HIVE-1412
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.6.0
>
> Attachments: HIVE-1412.patch
>
>
> CombineHiveInputFormat should combine all files inside one partition to form 
> a split but should not takes files cross partition boundary. This works for 
> regular table and partitions since all input paths are directory. However 
> this breaks when the input is files (in which case tablesample could be the 
> use case). CombineHiveInputFormat should adjust to the case when input could 
> also be non-directories. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1412) CombineHiveInputFormat bug on tablesample

2010-06-17 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879998#action_12879998
 ] 

Namit Jain commented on HIVE-1412:
--

The code changes look good - can you add a test for the same ?

> CombineHiveInputFormat bug on tablesample
> -
>
> Key: HIVE-1412
> URL: https://issues.apache.org/jira/browse/HIVE-1412
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.6.0
>
> Attachments: HIVE-1412.patch
>
>
> CombineHiveInputFormat should combine all files inside one partition to form 
> a split but should not takes files cross partition boundary. This works for 
> regular table and partitions since all input paths are directory. However 
> this breaks when the input is files (in which case tablesample could be the 
> use case). CombineHiveInputFormat should adjust to the case when input could 
> also be non-directories. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-287) count distinct on multiple columns does not work

2010-06-17 Thread John Sichi (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879994#action_12879994
 ] 

John Sichi commented on HIVE-287:
-

For DISTINCT:   we can check the function invocation itself (during semantic 
analysis) by calling supportsDistinct() immediately after instantiating the 
GenericUDAFEvaluator in SemanticAnalyzer.  This allows strict validation to be 
performed.  Or make the method name checkDistinct and allow the UDAF to throw 
the exception itself.  But I agree that in this case it would be cleaner to 
extend the interface, so I'm fine if we go ahead with that in a non-breaking 
fashion.

For COUNT(*):  if you think about it, COUNT(*) really means "ignore all 
columns" not "count all columns".  So I think an empty array actually makes a 
lot of sense here. Can you think of a case where UDAF(*) even makes sense, 
where UDAF != COUNT?  If you don't have access to any per-row data, what can 
you do other than count it?  I'd say we should actually disallow * for anything 
but COUNT, per the SQL standard.

I like your approach to keeping compatibility via instanceof, so if the 
decision ends up being to add the extra parameters, then we should definitely 
use that approach.  However, extension points should always be interfaces (not 
abstract classes) to allow for stuff like dynamic proxies.  So we would need to 
add a new interface GenericUDAFResolver2 (extends GenericUDAFResolver) with the 
new method, and make AbstractGenericUDAFResolver implement both.

Interface evolution is never pretty, but there is an interface design pattern 
which avoids this particular problem.  Imagine if originally we had defined a 
GenericUDAFResolverInput class inside of Hive itself, with a method 
getParameters() returning TypeInfo [].  HIve would instantiate this and pass an 
input object into getEvaluator, and the evaluator would call 
input.getParameters().  This would have allowed us to add a boolean 
isDistinct() method to GenericUDAFResolverInput without breaking anything 
(source or binary) and without needing to add a new interface; old plugins 
would not know about isDistinct() so they wouldn't call it, and new ones could.

I would argue that if we're going to go to the trouble of adding 
GenericUDAFResolver2, then we should build the pattern above into it as well in 
case we need further evolution later on.

p.s. I'm really glad you're working on this one...every few days I try a 
count(*) against Hive accidentally and then kick myself.


> count distinct on multiple columns does not work
> 
>
> Key: HIVE-287
> URL: https://issues.apache.org/jira/browse/HIVE-287
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Arvind Prabhakar
> Attachments: HIVE-287-1.patch, HIVE-287-2.patch, HIVE-287-3.patch
>
>
> The following query does not work:
> select count(distinct col1, col2) from Tbl

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1412) CombineHiveInputFormat bug on tablesample

2010-06-17 Thread HBase Review Board (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879986#action_12879986
 ] 

HBase Review Board commented on HIVE-1412:
--

Message from: "Carl Steinbach" 

---
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/206/
---

Review request for Hive Developers.


Summary
---

Review for 
https://issues.apache.org/jira/secure/attachment/12447377/HIVE-1412.patch


This addresses bug HIVE-1412.
http://issues.apache.org/jira/browse/HIVE-1412


Diffs
-

  trunk/ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 
955674 

Diff: http://review.hbase.org/r/206/diff


Testing
---


Thanks,

Carl




> CombineHiveInputFormat bug on tablesample
> -
>
> Key: HIVE-1412
> URL: https://issues.apache.org/jira/browse/HIVE-1412
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.6.0
>
> Attachments: HIVE-1412.patch
>
>
> CombineHiveInputFormat should combine all files inside one partition to form 
> a split but should not takes files cross partition boundary. This works for 
> regular table and partitions since all input paths are directory. However 
> this breaks when the input is files (in which case tablesample could be the 
> use case). CombineHiveInputFormat should adjust to the case when input could 
> also be non-directories. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-287) count distinct on multiple columns does not work

2010-06-17 Thread Arvind Prabhakar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879983#action_12879983
 ] 

Arvind Prabhakar commented on HIVE-287:
---

@John: Thanks for reviewing this change. I have some follow-up comments and 
suggestions:

bq. isDistinct: this doesn't actually modify the choice of evaluator 
implementation at all, since the actual duplicate elimination takes place 
upstream of the UDAF invocation. So instead of adding this parameter, can we 
instead add a new method supportsDistinct() on GenericUDAFEvaluator? 

While the evaluation may be happening upstream, I was concerned that it does 
not exclude the cases where this information is relevant to the function 
invocation itself. For example, the implementation of {{count}} requires that 
if there is a valid argument list, it must be qualified with {{DISTINCT}}.

bq. isAllColumns: COUNT is probably the only function which is ever even going 
to care about this one. Couldn't we just use an empty array of TypeInfo to 
indicate all columns?

I had a similar idea, but after some consideration opted for a simpler design. 
I felt that overloading arguments to indicate special cases might lead to 
confusion and eventual problem when a use-case emerges that invalidates this 
assumption. 

I do agree with your point that it will be good to stay compatible if possible. 
One way to do it would be as follows:

# Revert the {{GenericUDAFResolver}} to its previous state but make the 
interface deprecated in favor of the abstract base class.
# Push the newly introduced method into {{AbstractGenericUDAFResolver}} 
implementation.
# Modify {{FunctionRegistry.getGenericUDAFEvaluator()}} method to test the 
resolver instance to be type compatible with {{AbstractGenericUDAFResolver}} 
and if so, invoke the new method. Otherwise revert to the old mechanism.

What do you think about this approach?


> count distinct on multiple columns does not work
> 
>
> Key: HIVE-287
> URL: https://issues.apache.org/jira/browse/HIVE-287
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Arvind Prabhakar
> Attachments: HIVE-287-1.patch, HIVE-287-2.patch, HIVE-287-3.patch
>
>
> The following query does not work:
> select count(distinct col1, col2) from Tbl

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Review Request: HIVE-1412: CombineHiveInputFormat bug on tablesample

2010-06-17 Thread Carl Steinbach


---
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/206/
---

Review request for Hive Developers.


Summary
---

Review for 
https://issues.apache.org/jira/secure/attachment/12447377/HIVE-1412.patch


This addresses bug HIVE-1412.
http://issues.apache.org/jira/browse/HIVE-1412


Diffs
-

  trunk/ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 
955674 

Diff: http://review.hbase.org/r/206/diff


Testing
---


Thanks,

Carl

[jira] Updated: (HIVE-1412) CombineHiveInputFormat bug on tablesample

2010-06-17 Thread Carl Steinbach (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1412:
-

Component/s: Query Processor

> CombineHiveInputFormat bug on tablesample
> -
>
> Key: HIVE-1412
> URL: https://issues.apache.org/jira/browse/HIVE-1412
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.6.0
>
> Attachments: HIVE-1412.patch
>
>
> CombineHiveInputFormat should combine all files inside one partition to form 
> a split but should not takes files cross partition boundary. This works for 
> regular table and partitions since all input paths are directory. However 
> this breaks when the input is files (in which case tablesample could be the 
> use case). CombineHiveInputFormat should adjust to the case when input could 
> also be non-directories. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1401) Web Interface can ony browse default

2010-06-17 Thread HBase Review Board (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879979#action_12879979
 ] 

HBase Review Board commented on HIVE-1401:
--

Message from: "Carl Steinbach" 

---
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/205/
---

Review request for Hive Developers.


Summary
---

Review for 
https://issues.apache.org/jira/secure/attachment/12446827/HIVE-1401-1-patch.txt


This addresses bug HIVE-1401.
http://issues.apache.org/jira/browse/HIVE-1401


Diffs
-

  trunk/hwi/web/index.jsp 953471 
  trunk/hwi/web/session_result.jsp 953471 
  trunk/hwi/web/set_processor.jsp 953471 
  trunk/hwi/web/show_database.jsp 953471 

Diff: http://review.hbase.org/r/205/diff


Testing
---


Thanks,

Carl




> Web Interface can ony browse default
> 
>
> Key: HIVE-1401
> URL: https://issues.apache.org/jira/browse/HIVE-1401
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Web UI
>Affects Versions: 0.5.0
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Fix For: 0.6.0
>
> Attachments: HIVE-1401-1-patch.txt
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1369) LazySimpleSerDe should be able to read classes that support some form of toString()

2010-06-17 Thread HBase Review Board (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879978#action_12879978
 ] 

HBase Review Board commented on HIVE-1369:
--

Message from: "Carl Steinbach" 

---
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/202/
---

Review request for Hive Developers.


Summary
---

Review for 
https://issues.apache.org/jira/secure/attachment/12447394/HIVE-1369.svn.patch


This addresses bug HIVE-1369.
http://issues.apache.org/jira/browse/HIVE-1369


Diffs
-

  serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazySimpleSerDe.java 
211c733 
  serde/src/test/org/apache/hadoop/hive/serde2/lazy/TestLazySimpleSerDe.java 
6db9bc8 

Diff: http://review.hbase.org/r/202/diff


Testing
---


Thanks,

Carl




> LazySimpleSerDe should be able to read classes that support some form of 
> toString()
> ---
>
> Key: HIVE-1369
> URL: https://issues.apache.org/jira/browse/HIVE-1369
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 0.5.0
>Reporter: Alex Kozlov
>Assignee: Alex Kozlov
>Priority: Minor
> Fix For: 0.6.0
>
> Attachments: HIVE-1369.patch, HIVE-1369.svn.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Currently LazySimpleSerDe is able to deserialize only BytesWritable or Text 
> objects.  It should be pretty easy to extend the class to read any object 
> that implements toString() method.
> Ideas or concerns?
> Alex K

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Review Request: HIVE-1401: Web Interface can ony browse default

2010-06-17 Thread Carl Steinbach


---
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/205/
---

Review request for Hive Developers.


Summary
---

Review for 
https://issues.apache.org/jira/secure/attachment/12446827/HIVE-1401-1-patch.txt


This addresses bug HIVE-1401.
http://issues.apache.org/jira/browse/HIVE-1401


Diffs
-

  trunk/hwi/web/index.jsp 953471 
  trunk/hwi/web/session_result.jsp 953471 
  trunk/hwi/web/set_processor.jsp 953471 
  trunk/hwi/web/show_database.jsp 953471 

Diff: http://review.hbase.org/r/205/diff


Testing
---


Thanks,

Carl

Review Request: HIVE-1369: LazySimpleSerDe should be able to read classes that support some form of toString()

2010-06-17 Thread Carl Steinbach


---
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/202/
---

Review request for Hive Developers.


Summary
---

Review for 
https://issues.apache.org/jira/secure/attachment/12447394/HIVE-1369.svn.patch


This addresses bug HIVE-1369.
http://issues.apache.org/jira/browse/HIVE-1369


Diffs
-

  serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazySimpleSerDe.java 
211c733 
  serde/src/test/org/apache/hadoop/hive/serde2/lazy/TestLazySimpleSerDe.java 
6db9bc8 

Diff: http://review.hbase.org/r/202/diff


Testing
---


Thanks,

Carl

[jira] Updated: (HIVE-1369) LazySimpleSerDe should be able to read classes that support some form of toString()

2010-06-17 Thread Carl Steinbach (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1369:
-

Attachment: HIVE-1369.svn.patch

Looks like the original patch was generated with "git diff" without the 
--no-prefix switch. This causes patch to barf. HIVE-1369.svn.patch is an 
updated copy that applies cleanly with patch.

> LazySimpleSerDe should be able to read classes that support some form of 
> toString()
> ---
>
> Key: HIVE-1369
> URL: https://issues.apache.org/jira/browse/HIVE-1369
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 0.5.0
>Reporter: Alex Kozlov
>Assignee: Alex Kozlov
>Priority: Minor
> Fix For: 0.6.0
>
> Attachments: HIVE-1369.patch, HIVE-1369.svn.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Currently LazySimpleSerDe is able to deserialize only BytesWritable or Text 
> objects.  It should be pretty easy to extend the class to read any object 
> that implements toString() method.
> Ideas or concerns?
> Alex K

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1369) LazySimpleSerDe should be able to read classes that support some form of toString()

2010-06-17 Thread Carl Steinbach (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1369:
-

Affects Version/s: 0.5.0
  Component/s: Serializers/Deserializers

> LazySimpleSerDe should be able to read classes that support some form of 
> toString()
> ---
>
> Key: HIVE-1369
> URL: https://issues.apache.org/jira/browse/HIVE-1369
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 0.5.0
>Reporter: Alex Kozlov
>Assignee: Alex Kozlov
>Priority: Minor
> Fix For: 0.6.0
>
> Attachments: HIVE-1369.patch, HIVE-1369.svn.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Currently LazySimpleSerDe is able to deserialize only BytesWritable or Text 
> objects.  It should be pretty easy to extend the class to read any object 
> that implements toString() method.
> Ideas or concerns?
> Alex K

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1271) Case sensitiveness of type information specified when using custom reducer causes type mismatch

2010-06-17 Thread HBase Review Board (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879962#action_12879962
 ] 

HBase Review Board commented on HIVE-1271:
--

Message from: "Carl Steinbach" 

---
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/200/
---

Review request for Hive Developers.


Summary
---

Review for 
https://issues.apache.org/jira/secure/attachment/12440030/HIVE-1271-1.patch


This addresses bug HIVE-1271.
http://issues.apache.org/jira/browse/HIVE-1271


Diffs
-

  serde/src/java/org/apache/hadoop/hive/serde2/typeinfo/ListTypeInfo.java 
cb2fa57 
  serde/src/java/org/apache/hadoop/hive/serde2/typeinfo/MapTypeInfo.java 
a426e74 
  serde/src/java/org/apache/hadoop/hive/serde2/typeinfo/PrimitiveTypeInfo.java 
3d1c68e 
  serde/src/java/org/apache/hadoop/hive/serde2/typeinfo/StructTypeInfo.java 
87179aa 
  serde/src/java/org/apache/hadoop/hive/serde2/typeinfo/TypeInfo.java 0344718 
  serde/src/java/org/apache/hadoop/hive/serde2/typeinfo/package-info.java 
PRE-CREATION 

Diff: http://review.hbase.org/r/200/diff


Testing
---


Thanks,

Carl




> Case sensitiveness of type information specified when using custom reducer 
> causes type mismatch
> ---
>
> Key: HIVE-1271
> URL: https://issues.apache.org/jira/browse/HIVE-1271
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.5.0
>Reporter: Dilip Joseph
>Assignee: Arvind Prabhakar
> Fix For: 0.6.0
>
> Attachments: HIVE-1271-1.patch, HIVE-1271.patch
>
>
> Type information specified  while using a custom reduce script is converted 
> to lower case, and causes type mismatch during query semantic analysis .  The 
> following REDUCE query where field name =  "userId" failed.
> hive> CREATE TABLE SS (
>> a INT,
>> b INT,
>> vals ARRAY>
>> );
> OK
> hive> FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s
>> INSERT OVERWRITE TABLE SS
>> REDUCE *
>> USING 'myreduce.py'
>> AS
>> (a INT,
>> b INT,
>> vals ARRAY>
>> )
>> ;
> FAILED: Error in semantic analysis: line 2:27 Cannot insert into
> target table because column number/types are different SS: Cannot
> convert column 2 from array> to
> array>.
> The same query worked fine after changing "userId" to "userid".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Review Request: HIVE-1271: Case sensitiveness of type information specified when using custom reducer causes type mismatch

2010-06-17 Thread Carl Steinbach


---
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/200/
---

Review request for Hive Developers.


Summary
---

Review for 
https://issues.apache.org/jira/secure/attachment/12440030/HIVE-1271-1.patch


This addresses bug HIVE-1271.
http://issues.apache.org/jira/browse/HIVE-1271


Diffs
-

  serde/src/java/org/apache/hadoop/hive/serde2/typeinfo/ListTypeInfo.java 
cb2fa57 
  serde/src/java/org/apache/hadoop/hive/serde2/typeinfo/MapTypeInfo.java 
a426e74 
  serde/src/java/org/apache/hadoop/hive/serde2/typeinfo/PrimitiveTypeInfo.java 
3d1c68e 
  serde/src/java/org/apache/hadoop/hive/serde2/typeinfo/StructTypeInfo.java 
87179aa 
  serde/src/java/org/apache/hadoop/hive/serde2/typeinfo/TypeInfo.java 0344718 
  serde/src/java/org/apache/hadoop/hive/serde2/typeinfo/package-info.java 
PRE-CREATION 

Diff: http://review.hbase.org/r/200/diff


Testing
---


Thanks,

Carl

Re: Hive support to cassandra

2010-06-17 Thread Jeff Hammerbacher

Hey Tom,

Well, I was being a bit short, and for that I apologize. To elaborate:
Cassandra was conceived of as a solution for a vastly different problem than
data warehousing, and certain design decisions in the early days were made
in light of the needs of OLTP data management. To the best of my knowledge,
its primary users and contributors have continued that focus. The
integration with Hadoop MapReduce is primarily useful for bulk import and
export, as well as for facilitating data hygiene by making bulk
transformations possible (e.g. recoding a column or enforcing a consistency
constraint in an asynchronous fashion).

More generally, OLTP ("application data management") and data warehousing
("analytical data management") are two very different beasts, and to expect
a single storage system to be optimal for both kinds of workloads is one
place where I feel things went a bit wrong in the RDBMS world. I'm hopeful
that we can avoid some of that confusion with these next generation storage
systems, though the temptation of making both workloads happen in a single
system is likely too large to be avoided. Something like
https://issues.apache.org/jira/browse/HBASE-2357 may be helpful here if you
insist on making both workloads happen in a single system.

In any case, using Hive against an RCFile in HDFS is probably the best way
to go in the short term for the data warehouse, as both the HBase and
Cassandra support in Hive are experimental.

Regards,
Jeff

On Wed, Jun 16, 2010 at 9:14 PM, tom kersnick  wrote:

> You are not being rude Jeff.  This is a request from the client due to ease
> of use of Cassandra compared to Hbase.  I'm with you on this.  They are
> looking for apples to apples consistency.  Easy migration of data from OLTP
> (Cassandra) to their Data Warehouse (Cassandra?).  Apparently not.  Is it
> possible to migrate from Cassandra to Hbase?  Any documentation on this
> type
> of push to Hbase from Cassandra would be helpful.
>
> Thanks in advance.
>
> /tom
>
>
>
>
>
> On Wed, Jun 16, 2010 at 5:44 PM, Jeff Hammerbacher  >wrote:
>
> > Hey Tom,
> >
> > I don't want to be rude, but if you're using Cassandra for your data
> > warehouse environment, you're doing it wrong. HBase is the primary focus
> > for
> > integration with Hive (see
> > http://www.cloudera.com/blog/2010/06/integrating-hive-and-hbase/,
> > example). Cassandra is a great choice for an OLTP application, but
> > certainly
> > not for a data warehouse.
> >
> > Later,
> > Jeff
> >
> > On Wed, Jun 16, 2010 at 3:22 PM, tom kersnick 
> wrote:
> >
> > > Quick question for all of you.  Its seems that there is more movement
> > using
> > > Hive with Hbase rather than Cassandra.  Do you see this changing in the
> > > near
> > > future?  I have a client who is interested in using Cassandra due to
> the
> > > ease of maintenance.  They are planning on using Cassandra for both
> their
> > > data warehouse and OLTP environments.  Thoughts?
> > >
> > > I saw this ticket and I wanted to ask.
> > >
> > > Thanks in advance.
> > >
> > > /tom
> > >
> > >
> > > On Mon, May 3, 2010 at 12:42 PM, Edward Capriolo <
> edlinuxg...@gmail.com
> > > >wrote:
> > >
> > > > On Thu, Apr 8, 2010 at 1:17 PM, shirish 
> > > wrote:
> > > >
> > > > > > All,
> > > > > >
> > > > > > http://code.google.com/soc/.
> > > > > >
> > > > > > It is an interesting thing that Google offers stipends to get
> open
> > > > source
> > > > > > code written. However, last year I was was interested in a
> project
> > > that
> > > > > did
> > > > > > NOT get accepted into GSOC. It was quite deflating to be not
> > > > > > accepted/rejected.
> > > > > >
> > > > > > Money does make the world go around, and if we all had plenty of
> > > money
> > > > we
> > > > > > would all have more time to write open source code :) But on the
> > > chance
> > > > > > your
> > > > > > application does get rejected consider doing it anyway!
> > > > > >
> > > > > > Edward
> > > > > >
> > > > >
> > > > > Definitely Edward, Thanks for the suggestion :)
> > > > >
> > > > > shirish
> > > > >
> > > >
> > > > I did not see any cassandra or hive SOC projects
> > > >
> http://socghop.appspot.com/gsoc/program/list_projects/google/gsoc2010.
> > > :(
> > > > So
> > > > if no one is going to pick this cassandra interface up I will pick it
> > up
> > > > after I close some pending things that is two strikes for me and
> > > GSOC.
> > > >
> > >
> >
>

[jira] Updated: (HIVE-1211) Tapping logs from child processes

2010-06-17 Thread Carl Steinbach (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1211:
-

Status: Open  (was: Patch Available)

Hi bc, sorry for the delay in looking at this. Can you please rebase this patch 
against trunk and resubmit? Thanks!

> Tapping logs from child processes
> -
>
> Key: HIVE-1211
> URL: https://issues.apache.org/jira/browse/HIVE-1211
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Logging
>Reporter: bc Wong
>Assignee: bc Wong
> Fix For: 0.6.0
>
> Attachments: HIVE-1211.1.patch
>
>
> Stdout/stderr from child processes (e.g. {{MapRedTask}}) are redirected to 
> the parent's stdout/stderr. There is little one can do to to sort out which 
> log is from which query.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1095) Hive in Maven

2010-06-17 Thread Carl Steinbach (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1095:
-

Status: Open  (was: Patch Available)

> Hive in Maven
> -
>
> Key: HIVE-1095
> URL: https://issues.apache.org/jira/browse/HIVE-1095
> Project: Hadoop Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Affects Versions: 0.6.0
>Reporter: Gerrit Jansen van Vuuren
>Priority: Minor
> Attachments: HIVE-1095-trunk.patch, hiveReleasedToMaven.tar.gz
>
>
> Getting hive into maven main repositories
> Documentation on how to do this is on:
> http://maven.apache.org/guides/mini/guide-central-repository-upload.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1095) Hive in Maven

2010-06-17 Thread Carl Steinbach (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879950#action_12879950
 ] 

Carl Steinbach commented on HIVE-1095:
--

The patch applies cleanly, but I get the following error when I run the 
prepare-maven-publish target:

{code}

% ant maven-publish-artifact
Buildfile: /Users/carl/Projects/hive/build.xml

ant-task-download:
  [get] Getting: 
http://repo2.maven.org/maven2/org/apache/maven/maven-ant-tasks/2.1.0/maven-ant-tasks-2.1.0.jar
  [get] To: /Users/carl/Projects/hive/build/maven-ant-tasks-2.1.0.jar

mvn-taskdef:

maven-publish-artifact:
[artifact:pom] An error has occurred while processing the Maven artifact tasks.
[artifact:pom]  Diagnosis:
[artifact:pom] 
[artifact:pom] Unable to initialize POM hive-${hive.project}-0.6.0.pom: Could 
not find the model file 
'/Users/carl/Projects/hive/build/maven/poms/hive-${hive.project}-0.6.0.pom'. 
for project unknown
[artifact:pom] 
/Users/carl/Projects/hive/build/maven/poms/hive-${hive.project}-0.6.0.pom (No 
such file or directory)

BUILD FAILED
/Users/carl/Projects/hive/build.xml:410: Unable to initialize POM 
hive-${hive.project}-0.6.0.pom:
Could not find the model file 
'/Users/carl/Projects/hive/build/maven/poms/hive-${hive.project}-0.6.0.pom'. 
for project unknown

Total time: 5 seconds
{code}

It looks like the make-pom target needs to depend on ivy-init?


> Hive in Maven
> -
>
> Key: HIVE-1095
> URL: https://issues.apache.org/jira/browse/HIVE-1095
> Project: Hadoop Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Affects Versions: 0.6.0
>Reporter: Gerrit Jansen van Vuuren
>Priority: Minor
> Attachments: HIVE-1095-trunk.patch, hiveReleasedToMaven.tar.gz
>
>
> Getting hive into maven main repositories
> Documentation on how to do this is on:
> http://maven.apache.org/guides/mini/guide-central-repository-upload.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1387) Make PERCENTILE work with double data type

2010-06-17 Thread John Sichi (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879943#action_12879943
 ] 

John Sichi commented on HIVE-1387:
--

Two refinements:

1) Rather than subclassing GenericUDAFHistogramNumeric directly, I would 
recommend factoring out commonality (either into an abstract base, or into a 
separate reusable class representing the histogram component).

2) For the new percentile generic UDAF, use the defaults you describe, but 
allow the user to override (i.e. to be able to choose approx even for integers, 
or to be able to choose exact for floating point).  And for approx, provide 
override in addition to your default on the number of bins.


> Make PERCENTILE work with double data type
> --
>
> Key: HIVE-1387
> URL: https://issues.apache.org/jira/browse/HIVE-1387
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: Vaibhav Aggarwal
>Assignee: Mayank Lahiri
> Attachments: patch-1387-1.patch
>
>
> The PERCENTILE UDAF does not work with double datatype.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1056) Predicate push down does not work with UDTF's

2010-06-17 Thread HBase Review Board (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879932#action_12879932
 ] 

HBase Review Board commented on HIVE-1056:
--

Message from: "Carl Steinbach" 

---
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/198/
---

Review request for Hive Developers.


Summary
---

Review for patch 
https://issues.apache.org/jira/secure/attachment/12441230/HIVE-1056.1.patch


This addresses bug HIVE-1056.
http://issues.apache.org/jira/browse/HIVE-1056


Diffs
-

  
contrib/src/java/org/apache/hadoop/hive/contrib/udtf/example/GenericUDTFCount2.java
 PRE-CREATION 
  contrib/src/test/queries/clientpositive/udtf_output_on_close.q PRE-CREATION 
  contrib/src/test/results/clientpositive/udtf_output_on_close.q.out 
PRE-CREATION 
  ql/if/queryplan.thrift d387e8e 
  ql/src/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java 
99d1f2d 
  ql/src/gen-php/queryplan_types.php 334b4f8 
  ql/src/gen-py/queryplan/ttypes.py d228e68 
  ql/src/java/org/apache/hadoop/hive/ql/exec/LateralViewForwardOperator.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/LateralViewJoinOperator.java 
371a7ac 
  ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 03bd0bb 
  ql/src/java/org/apache/hadoop/hive/ql/exec/UDTFOperator.java cab3057 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java a2f9dba 
  ql/src/java/org/apache/hadoop/hive/ql/plan/LateralViewForwardDesc.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java 6e1dfe4 
  ql/src/java/org/apache/hadoop/hive/ql/ppd/PredicatePushDown.java 19fe5f4 
  ql/src/test/queries/clientpositive/lateral_view_ppd.q PRE-CREATION 
  ql/src/test/results/clientpositive/lateral_view.q.out 8931455 
  ql/src/test/results/clientpositive/lateral_view_ppd.q.out PRE-CREATION 

Diff: http://review.hbase.org/r/198/diff


Testing
---


Thanks,

Carl




> Predicate push down does not work with UDTF's
> -
>
> Key: HIVE-1056
> URL: https://issues.apache.org/jira/browse/HIVE-1056
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.5.0, 0.6.0
>Reporter: Paul Yang
>Assignee: Paul Yang
> Fix For: 0.6.0
>
> Attachments: HIVE-1056.1.patch
>
>
> Predicate push down does not work with UDTF's in lateral views
> {code}
> hive> SELECT * FROM src LATERAL VIEW explode(array(1,2,3)) myTable AS k WHERE 
> k=1;
> FAILED: Unknown exception: null
> hive>
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Review Request: HIVE-1056Predicate push down does not work with UDTF's

2010-06-17 Thread Carl Steinbach


---
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/198/
---

Review request for Hive Developers.


Summary
---

Review for patch 
https://issues.apache.org/jira/secure/attachment/12441230/HIVE-1056.1.patch


This addresses bug HIVE-1056.
http://issues.apache.org/jira/browse/HIVE-1056


Diffs
-

  
contrib/src/java/org/apache/hadoop/hive/contrib/udtf/example/GenericUDTFCount2.java
 PRE-CREATION 
  contrib/src/test/queries/clientpositive/udtf_output_on_close.q PRE-CREATION 
  contrib/src/test/results/clientpositive/udtf_output_on_close.q.out 
PRE-CREATION 
  ql/if/queryplan.thrift d387e8e 
  ql/src/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java 
99d1f2d 
  ql/src/gen-php/queryplan_types.php 334b4f8 
  ql/src/gen-py/queryplan/ttypes.py d228e68 
  ql/src/java/org/apache/hadoop/hive/ql/exec/LateralViewForwardOperator.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/LateralViewJoinOperator.java 
371a7ac 
  ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 03bd0bb 
  ql/src/java/org/apache/hadoop/hive/ql/exec/UDTFOperator.java cab3057 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java a2f9dba 
  ql/src/java/org/apache/hadoop/hive/ql/plan/LateralViewForwardDesc.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java 6e1dfe4 
  ql/src/java/org/apache/hadoop/hive/ql/ppd/PredicatePushDown.java 19fe5f4 
  ql/src/test/queries/clientpositive/lateral_view_ppd.q PRE-CREATION 
  ql/src/test/results/clientpositive/lateral_view.q.out 8931455 
  ql/src/test/results/clientpositive/lateral_view_ppd.q.out PRE-CREATION 

Diff: http://review.hbase.org/r/198/diff


Testing
---


Thanks,

Carl

[jira] Commented: (HIVE-287) count distinct on multiple columns does not work

2010-06-17 Thread John Sichi (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879929#action_12879929
 ] 

John Sichi commented on HIVE-287:
-

Sorry to chime in late on this one, but I have one big question on this one:  
can we instead do it in a way which does not break the UDAF interface?

The existing patch adds a new method to the GenericUDAFResolver interface, 
meaning all existing plugin implementations outside of the Hive codebase will 
fail to compile (due to the fact that we did not already have the insulating 
abstract base class available).  We already have some of these within Facebook.

Let's analyze the two new parameters one by one.

isDistinct:  this doesn't actually modify the choice of evaluator 
implementation at all, since the actual duplicate elimination takes place 
upstream of the UDAF invocation.  So instead of adding this parameter, can we 
instead add a new method supportsDistinct() on GenericUDAFEvaluator?  Then call 
this after instantiating the new evaluator in order to carry out the additional 
validation.

isAllColumns:  COUNT(*) is probably the only function which is ever even going 
to care about this one.  Couldn't we just use an empty array of TypeInfo to 
indicate all columns?

Independent of the above, I think adding the insulating abstract base should 
still be done now to make future transitions smoother when interface-breaking 
is absolutely required.  So keep that part of the patch.


> count distinct on multiple columns does not work
> 
>
> Key: HIVE-287
> URL: https://issues.apache.org/jira/browse/HIVE-287
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Arvind Prabhakar
> Attachments: HIVE-287-1.patch, HIVE-287-2.patch, HIVE-287-3.patch
>
>
> The following query does not work:
> select count(distinct col1, col2) from Tbl

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HIVE-1413) bring a table/partition offline

2010-06-17 Thread Namit Jain (JIRA)

bring a table/partition offline
---

 Key: HIVE-1413
 URL: https://issues.apache.org/jira/browse/HIVE-1413
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Metastore, Query Processor
Reporter: Namit Jain
Assignee: Paul Yang
 Fix For: 0.6.0


There should be a way to bring a table/partition offline.
At that time, no read/write operations should be supported on that table.

It would be very useful for housekeeping operations

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1019) java.io.FileNotFoundException: HIVE_PLAN (No such file or directory)

2010-06-17 Thread Carl Steinbach (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1019:
-

Status: Open  (was: Patch Available)

> java.io.FileNotFoundException: HIVE_PLAN (No such file or directory)
> 
>
> Key: HIVE-1019
> URL: https://issues.apache.org/jira/browse/HIVE-1019
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Bennie Schut
>Assignee: Bennie Schut
>Priority: Minor
> Fix For: 0.6.0
>
> Attachments: HIVE-1019-1.patch, HIVE-1019-2.patch, HIVE-1019-3.patch, 
> HIVE-1019-4.patch, HIVE-1019-5.patch, HIVE-1019-6.patch, HIVE-1019-7.patch, 
> HIVE-1019-8.patch, HIVE-1019.patch, stacktrace2.txt
>
>
> I keep getting errors like this:
> java.io.FileNotFoundException: HIVE_PLAN (No such file or directory)
> and :
> java.io.IOException: cannot find dir = 
> hdfs://victoria.ebuddy.com:9000/tmp/hive-dwh/801467596/10002 in 
> partToPartitionInfo!
> when running multiple threads with roughly similar queries.
> I have a patch for this which works for me.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1222) in metastore, do not store names of inputformat/outputformat/serde for non-native tables

2010-06-17 Thread John Sichi (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879910#action_12879910
 ] 

John Sichi commented on HIVE-1222:
--

We should probably also prevent supplying ROW FORMAT together with STORED BY; 
see this hive-dev thread:

http://mail-archives.apache.org/mod_mbox/hadoop-hive-dev/201006.mbox/browser


> in metastore, do not store names of inputformat/outputformat/serde for 
> non-native tables
> 
>
> Key: HIVE-1222
> URL: https://issues.apache.org/jira/browse/HIVE-1222
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: HBase Handler, Metastore, Query Processor
>Affects Versions: 0.6.0
>Reporter: John Sichi
>Assignee: John Sichi
>
> Instead, store null and get them dynamically from the storage handler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: Hive-Hbase integration problem, ask for help

2010-06-17 Thread John Sichi

I've added this on as extra validation which ought to be added in HIVE-1222.

JVS

On Jun 15, 2010, at 3:59 PM, Basab Maulik wrote:

> I was not able to reproduce this problem on trunk (can't remember the
> label). The funny thing was both the create table and the insert overwrite
> worked even though the create table contained the invalid row format spec.
> 
> Basab
> 
> On Fri, Jun 11, 2010 at 1:33 PM, John Sichi  wrote:
> 
>> You should not be specifying any ROW FORMAT for an HBase table.
>> 
>> From the log in your earlier post, I couldn't tell what was going wrong; I
>> don' think it contained the full exception stacks.  You might be able to dig
>> around in the actual log files to find more.
>> 
>> JVS
>> 
>> From: Zhou Shuaifeng [zhoushuaif...@huawei.com]
>> Sent: Thursday, June 10, 2010 7:26 PM
>> To: hive-dev@hadoop.apache.org
>> Cc: 'zhaozhifeng 00129982'
>> Subject: Hive-Hbase integration problem, ask for help
>> 
>> Hi Guys,
>> 
>> I download the hive source from SVN server, build it and try to run the
>> hive-hbase integration.
>> 
>> I works well on all file-based hive tables, but on the hbase-based tables,
>> the 'insert' command cann't run successful. The 'select' command can run
>> well.
>> 
>> error info is below:
>> 
>> hive> INSERT OVERWRITE TABLE hive_zsf SELECT * FROM zsf WHERE id=3;
>> Total MapReduce jobs = 1
>> Launching Job 1 out of 1
>> Number of reduce tasks is set to 0 since there's no reduce operator
>> Starting Job = job_201006081948_0021, Tracking URL =
>> http://linux-01:50030/jobdetails.jsp?jobid=job_201006081948_0021
>> Kill Command = /opt/hadoop/hdfs/bin/../bin/hadoop job
>> -Dmapred.job.tracker=linux-01:9001 -kill job_201006081948_0021
>> 2010-06-09 16:05:43,898 Stage-0 map = 0%,  reduce = 0%
>> 2010-06-09 16:06:12,131 Stage-0 map = 100%,  reduce = 100%
>> Ended Job = job_201006081948_0021 with errors
>> 
>> Task with the most failures(4):
>> -
>> Task ID:
>> task_201006081948_0021_m_00
>> 
>> URL:
>> http://linux-01:50030/taskdetails.jsp?jobid=job_201006081948_0021
>> <
>> http://linux-01:50030/taskdetails.jsp?jobid=job_201006081948_0021&tipid=tas
>> k_201006081948_0021_m_00>
>> &tipid=task_201006081948_0021_m_00
>> -
>> 
>> FAILED: Execution Error, return code 2 from
>> org.apache.hadoop.hive.ql.exec.ExecDriver
>> 
>> 
>> 
>> 
>> I create a hbase-based table with hive, put some data into the hbase table
>> through the hbase shell, and can select data from it through hive:
>> 
>> CREATE TABLE hive_zsf1(id int, name string) ROW FORMAT DELIMITED
>> FIELDS TERMINATED BY '\t'
>> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>> WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")
>> TBLPROPERTIES ("hbase.table.name" = "hive_zsf1");
>> 
>> hbase(main):001:0> scan 'hive_zsf1'
>> ROW  COLUMN+CELL
>> 
>> 1   column=cf1:val, timestamp=1276157509028,
>> value=zsf
>> 2   column=cf1:val, timestamp=1276157539051,
>> value=zzf
>> 3   column=cf1:val, timestamp=1276157548247,
>> value=zw
>> 4   column=cf1:val, timestamp=1276157557115,
>> value=cjl
>> 4 row(s) in 0.0470 seconds
>> hbase(main):002:0>
>> 
>> hive> select * from hive_zsf1 where id=3;
>> Total MapReduce jobs = 1
>> Launching Job 1 out of 1
>> Number of reduce tasks is set to 0 since there's no reduce operator
>> Starting Job = job_201006081948_0038, Tracking URL =
>> http://linux-01:50030/jobdetails.jsp?jobid=job_201006081948_0038
>> Kill Command = /opt/hadoop/hdfs/bin/../bin/hadoop job
>> -Dmapred.job.tracker=linux-01:9001 -kill job_201006081948_0038
>> 2010-06-11 10:25:42,049 Stage-1 map = 0%,  reduce = 0%
>> 2010-06-11 10:25:45,090 Stage-1 map = 100%,  reduce = 0%
>> 2010-06-11 10:25:48,133 Stage-1 map = 100%,  reduce = 100%
>> Ended Job = job_201006081948_0038
>> OK
>> 3   zw
>> Time taken: 13.526 seconds
>> hive>
>> 
>> 
>> 
>> 
>> 
>> 
>> -
>> This e-mail and its attachments contain confidential information from
>> HUAWEI, which
>> is intended only for the person or entity whose address is listed above.
>> Any
>> use of the
>> information contained herein in any way (including, but not limited to,
>> total or partial
>> disclosure, reproduction, or dissemination) by persons other than the
>> intended
>> recipient(s) is prohibited. If you receive this e-mail in error, please
>> notify the sender by
>> phone or email immediately and delete it!
>> 
>> 
>>

[jira] Updated: (HIVE-1412) CombineHiveInputFormat bug on tablesample

2010-06-17 Thread Ning Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1412:
-

Attachment: HIVE-1412.patch

> CombineHiveInputFormat bug on tablesample
> -
>
> Key: HIVE-1412
> URL: https://issues.apache.org/jira/browse/HIVE-1412
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.6.0
>
> Attachments: HIVE-1412.patch
>
>
> CombineHiveInputFormat should combine all files inside one partition to form 
> a split but should not takes files cross partition boundary. This works for 
> regular table and partitions since all input paths are directory. However 
> this breaks when the input is files (in which case tablesample could be the 
> use case). CombineHiveInputFormat should adjust to the case when input could 
> also be non-directories. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1412) CombineHiveInputFormat bug on tablesample

2010-06-17 Thread Ning Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1412:
-

   Status: Patch Available  (was: Open)
Fix Version/s: 0.6.0

> CombineHiveInputFormat bug on tablesample
> -
>
> Key: HIVE-1412
> URL: https://issues.apache.org/jira/browse/HIVE-1412
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.6.0
>
> Attachments: HIVE-1412.patch
>
>
> CombineHiveInputFormat should combine all files inside one partition to form 
> a split but should not takes files cross partition boundary. This works for 
> regular table and partitions since all input paths are directory. However 
> this breaks when the input is files (in which case tablesample could be the 
> use case). CombineHiveInputFormat should adjust to the case when input could 
> also be non-directories. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HIVE-1412) CombineHiveInputFormat bug on tablesample

2010-06-17 Thread Ning Zhang (JIRA)

CombineHiveInputFormat bug on tablesample
-

 Key: HIVE-1412
 URL: https://issues.apache.org/jira/browse/HIVE-1412
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Ning Zhang
Assignee: Ning Zhang


CombineHiveInputFormat should combine all files inside one partition to form a 
split but should not takes files cross partition boundary. This works for 
regular table and partitions since all input paths are directory. However this 
breaks when the input is files (in which case tablesample could be the use 
case). CombineHiveInputFormat should adjust to the case when input could also 
be non-directories. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1387) Make PERCENTILE work with double data type

2010-06-17 Thread Mayank Lahiri (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879894#action_12879894
 ] 

Mayank Lahiri commented on HIVE-1387:
-

This is what I suggest we do to resolve this issue:

1. Create a new percentile_approx() function that overrides 
GenericUDAFHistogramNumeric to approximate a fine-grained histogram with many 
bins (say 10,000 for example, but I'll run some experiments), and then use the 
histogram to estimate the percentile value.

2. Convert the existing simple percentile() UDAF to a generic UDAF. When the 
input is byte, short, int, or long, then use the existing code (with some 
modifications, like converting the linear scan to a binary search). When the 
input is float or double, then automatically use the percentile_approx() 
function. 

> Make PERCENTILE work with double data type
> --
>
> Key: HIVE-1387
> URL: https://issues.apache.org/jira/browse/HIVE-1387
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: Vaibhav Aggarwal
>Assignee: Mayank Lahiri
> Attachments: patch-1387-1.patch
>
>
> The PERCENTILE UDAF does not work with double datatype.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-543) provide option to run hive in local mode

2010-06-17 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-543:


  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

Committed. Thanks Joy

> provide option to run hive in local mode
> 
>
> Key: HIVE-543
> URL: https://issues.apache.org/jira/browse/HIVE-543
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 0.5.0
>Reporter: Joydeep Sen Sarma
>Assignee: Joydeep Sen Sarma
> Fix For: 0.6.0
>
> Attachments: hive-534.patch.2, hive-543.patch.1
>
>
> this is a little bit more than just mapred.job.tracker=local
> when run in this mode - multiple jobs are an issue since writing to same tmp 
> directories is an issue. the following options:
> hadoop.tmp.dir
> mapred.local.dir
> need to be randomized (perhaps based on queryid). 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

RE: Vertical partitioning

2010-06-17 Thread Ashish Thusoo

If you are querying this data again and again you could just create another 
table which has only those 10 columns (more like a materialized view approach - 
though that is not there in Hive yet.) This ofcourse uses up some space as 
compared to vertical partitioning but if the rcfile performance is not good 
enough, this could be the workaround for now. Also do you see a lot more time 
spent on I/O in your queries?

Ashish

-Original Message-
From: Edward Capriolo [mailto:edlinuxg...@gmail.com] 
Sent: Thursday, June 17, 2010 9:02 AM
To: hive-dev@hadoop.apache.org
Subject: Re: Vertical partitioning

On Thu, Jun 17, 2010 at 3:00 AM, jaydeep vishwakarma < 
jaydeep.vishwaka...@mkhoj.com> wrote:

> Just looking opportunity and feasibility for it. In one of my table 
> have more than 20 fields where most of the time I need only 10 main 
> fields. We rarely need other fields for day to day analysis.
>
> Regards,
> Jaydeep
>
>
> Ning Zhang wrote:
>
> Hive support columnar storage (RCFile) but not vertical partitioning. 
> Is there any use case for vertical partitioning?
>
> On Jun 16, 2010, at 6:41 AM, jaydeep vishwakarma wrote:
>
>
>
> Hi,
>
> Does hive support Vertical partitioning?
>
> Regards,
> Jaydeep
>
>
>
> The information contained in this communication is intended solely for 
> the use of the individual or entity to whom it is addressed and others 
> authorized to receive it. It may contain confidential or legally 
> privileged information. If you are not the intended recipient you are 
> hereby notified that any disclosure, copying, distribution or taking 
> any action in reliance on the contents of this information is strictly 
> prohibited and may be unlawful. If you have received this 
> communication in error, please notify us immediately by responding to this 
> email and then delete it from your system.
> The firm is neither liable for the proper and complete transmission of 
> the information contained in this communication nor for any delay in 
> its receipt.
>
>
>
>
>
>
>
> 
>
> The information contained in this communication is intended solely for 
> the use of the individual or entity to whom it is addressed and others 
> authorized to receive it. It may contain confidential or legally 
> privileged information. If you are not the intended recipient you are 
> hereby notified that any disclosure, copying, distribution or taking 
> any action in reliance on the contents of this information is strictly 
> prohibited and may be unlawful. If you have received this 
> communication in error, please notify us immediately by responding to this 
> email and then delete it from your system.
> The firm is neither liable for the proper and complete transmission of 
> the information contained in this communication nor for any delay in 
> its receipt.
>

Vertical partitioning is just as practical in a traditional RDBMS as it would 
be in hive. Normally you would do it for a few reasons:
1) You have some rarely used columns and you want to reduce the table/row size
2) Your DBMS has terrible blob/clob/text support and the only want to get large 
objects out of your way is to put them in other tables.

If you go the option of vertical partitioning in hive, you may have to join to 
select the columns you need. I do not consider row serialization and de 
serialization to be the majority of a hive job, and in most cases hadoop 
handles 1 large file better then two smaller ones. Then again we have some 
tables 140+ columns so i can see vertical partitioning helping with those 
tables but it doubles the management.

[jira] Commented: (HIVE-293) report deserialize exceptions from serde's via exceptions

2010-06-17 Thread Alex Rovner (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879866#action_12879866
 ] 

Alex Rovner commented on HIVE-293:
--

How about if some one is using a custom input format?

> report deserialize exceptions from serde's via exceptions
> -
>
> Key: HIVE-293
> URL: https://issues.apache.org/jira/browse/HIVE-293
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Joydeep Sen Sarma
>
> lazyserde and dynamicserde should report exceptions on number (and other) 
> parsing errors so higher layers can take the correct action

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1383) allow HBase WAL to be disabled

2010-06-17 Thread Ning Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879850#action_12879850
 ] 

Ning Zhang commented on HIVE-1383:
--

+1. Will commit if tests pass. 

> allow HBase WAL to be disabled
> --
>
> Key: HIVE-1383
> URL: https://issues.apache.org/jira/browse/HIVE-1383
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: HBase Handler
>Affects Versions: 0.6.0
>Reporter: John Sichi
>Assignee: John Sichi
> Fix For: 0.6.0
>
> Attachments: HIVE-1383.1.patch, HIVE-1383.2.patch, HIVE-1383.3.patch, 
> HIVE-1383.4.patch
>
>
> Disabling WAL can lead to much better INSERT performance in cases where other 
> means of safe recovery (such as bulk import) are available.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Build failed in Hudson: Hive-trunk-h0.19 #474

2010-06-17 Thread Apache Hudson Server

See 

Changes:

[namit] HIVE-543. Add local mode execution in hive
(Joydeep Sen Sarma via namit)

[jvs] HIVE-1255. Add mathematical UDFs PI, E, degrees, radians, tan,
sign, and atan.  (Edward Capriolo via jvs)

--
[...truncated 13785 lines...]
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table src
[junit] POSTHOOK: Output: defa...@src
[junit] OK
[junit] Loading data to table src1
[junit] POSTHOOK: Output: defa...@src1
[junit] OK
[junit] Loading data to table src_sequencefile
[junit] POSTHOOK: Output: defa...@src_sequencefile
[junit] OK
[junit] Loading data to table src_thrift
[junit] POSTHOOK: Output: defa...@src_thrift
[junit] OK
[junit] Loading data to table src_json
[junit] POSTHOOK: Output: defa...@src_json
[junit] OK
[junit] diff 

 

[junit] Done query: unknown_function4.q
[junit] Begin query: unknown_table1.q
[junit] Loading data to table srcpart partition (ds=2008-04-08, hr=11)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=11
[junit] OK
[junit] Loading data to table srcpart partition (ds=2008-04-08, hr=12)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=12
[junit] OK
[junit] Loading data to table srcpart partition (ds=2008-04-09, hr=11)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=11
[junit] OK
[junit] Loading data to table srcpart partition (ds=2008-04-09, hr=12)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=12
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table src
[junit] POSTHOOK: Output: defa...@src
[junit] OK
[junit] Loading data to table src1
[junit] POSTHOOK: Output: defa...@src1
[junit] OK
[junit] Loading data to table src_sequencefile
[junit] POSTHOOK: Output: defa...@src_sequencefile
[junit] OK
[junit] Loading data to table src_thrift
[junit] POSTHOOK: Output: defa...@src_thrift
[junit] OK
[junit] Loading data to table src_json
[junit] POSTHOOK: Output: defa...@src_json
[junit] OK
[junit] diff 

 

[junit] Done query: unknown_table1.q
[junit] Begin query: unknown_table2.q
[junit] Loading data to table srcpart partition (ds=2008-04-08, hr=11)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=11
[junit] OK
[junit] Loading data to table srcpart partition (ds=2008-04-08, hr=12)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=12
[junit] OK
[junit] Loading data to table srcpart partition (ds=2008-04-09, hr=11)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=11
[junit] OK
[junit] Loading data to table srcpart partition (ds=2008-04-09, hr=12)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=12
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Ou

Re: Hive support to cassandra

2010-06-17 Thread tom kersnick

I really appreciate the help Edward.

/tom


On Thu, Jun 17, 2010 at 8:04 AM, Edward Capriolo wrote:

> On Thu, Jun 17, 2010 at 12:14 AM, tom kersnick  wrote:
>
> > You are not being rude Jeff.  This is a request from the client due to
> ease
> > of use of Cassandra compared to Hbase.  I'm with you on this.  They are
> > looking for apples to apples consistency.  Easy migration of data from
> OLTP
> > (Cassandra) to their Data Warehouse (Cassandra?).  Apparently not.  Is it
> > possible to migrate from Cassandra to Hbase?  Any documentation on this
> > type
> > of push to Hbase from Cassandra would be helpful.
> >
> > Thanks in advance.
> >
> > /tom
> >
> >
> >
> >
> >
> > On Wed, Jun 16, 2010 at 5:44 PM, Jeff Hammerbacher  > >wrote:
> >
> > > Hey Tom,
> > >
> > > I don't want to be rude, but if you're using Cassandra for your data
> > > warehouse environment, you're doing it wrong. HBase is the primary
> focus
> > > for
> > > integration with Hive (see
> > > http://www.cloudera.com/blog/2010/06/integrating-hive-and-hbase/,
> > > example). Cassandra is a great choice for an OLTP application, but
> > > certainly
> > > not for a data warehouse.
> > >
> > > Later,
> > > Jeff
> > >
> > > On Wed, Jun 16, 2010 at 3:22 PM, tom kersnick 
> > wrote:
> > >
> > > > Quick question for all of you.  Its seems that there is more movement
> > > using
> > > > Hive with Hbase rather than Cassandra.  Do you see this changing in
> the
> > > > near
> > > > future?  I have a client who is interested in using Cassandra due to
> > the
> > > > ease of maintenance.  They are planning on using Cassandra for both
> > their
> > > > data warehouse and OLTP environments.  Thoughts?
> > > >
> > > > I saw this ticket and I wanted to ask.
> > > >
> > > > Thanks in advance.
> > > >
> > > > /tom
> > > >
> > > >
> > > > On Mon, May 3, 2010 at 12:42 PM, Edward Capriolo <
> > edlinuxg...@gmail.com
> > > > >wrote:
> > > >
> > > > > On Thu, Apr 8, 2010 at 1:17 PM, shirish 
> > > > wrote:
> > > > >
> > > > > > > All,
> > > > > > >
> > > > > > > http://code.google.com/soc/.
> > > > > > >
> > > > > > > It is an interesting thing that Google offers stipends to get
> > open
> > > > > source
> > > > > > > code written. However, last year I was was interested in a
> > project
> > > > that
> > > > > > did
> > > > > > > NOT get accepted into GSOC. It was quite deflating to be not
> > > > > > > accepted/rejected.
> > > > > > >
> > > > > > > Money does make the world go around, and if we all had plenty
> of
> > > > money
> > > > > we
> > > > > > > would all have more time to write open source code :) But on
> the
> > > > chance
> > > > > > > your
> > > > > > > application does get rejected consider doing it anyway!
> > > > > > >
> > > > > > > Edward
> > > > > > >
> > > > > >
> > > > > > Definitely Edward, Thanks for the suggestion :)
> > > > > >
> > > > > > shirish
> > > > > >
> > > > >
> > > > > I did not see any cassandra or hive SOC projects
> > > > >
> > http://socghop.appspot.com/gsoc/program/list_projects/google/gsoc2010.
> > > > :(
> > > > > So
> > > > > if no one is going to pick this cassandra interface up I will pick
> it
> > > up
> > > > > after I close some pending things that is two strikes for me
> and
> > > > GSOC.
> > > > >
> > > >
> > >
> >
>
> I am currently in the process of writing a cassandra storage handler to
> match the Hbase one. Ill open a ticket for it. I was looking to tackle it
> after the Hive Variables ticket I am working on.
>
> Edward
>

Re: Vertical partitioning

2010-06-17 Thread Edward Capriolo

On Thu, Jun 17, 2010 at 3:00 AM, jaydeep vishwakarma <
jaydeep.vishwaka...@mkhoj.com> wrote:

> Just looking opportunity and feasibility for it. In one of my table have
> more than 20 fields where most of the time I need only 10 main fields. We
> rarely need other fields for day to day analysis.
>
> Regards,
> Jaydeep
>
>
> Ning Zhang wrote:
>
> Hive support columnar storage (RCFile) but not vertical partitioning. Is
> there any use case for vertical partitioning?
>
> On Jun 16, 2010, at 6:41 AM, jaydeep vishwakarma wrote:
>
>
>
> Hi,
>
> Does hive support Vertical partitioning?
>
> Regards,
> Jaydeep
>
>
>
> The information contained in this communication is intended solely for the
> use of the individual or entity to whom it is addressed and others
> authorized to receive it. It may contain confidential or legally privileged
> information. If you are not the intended recipient you are hereby notified
> that any disclosure, copying, distribution or taking any action in reliance
> on the contents of this information is strictly prohibited and may be
> unlawful. If you have received this communication in error, please notify us
> immediately by responding to this email and then delete it from your system.
> The firm is neither liable for the proper and complete transmission of the
> information contained in this communication nor for any delay in its
> receipt.
>
>
>
>
>
>
>
> 
>
> The information contained in this communication is intended solely for the
> use of the individual or entity to whom it is addressed and others
> authorized to receive it. It may contain confidential or legally privileged
> information. If you are not the intended recipient you are hereby notified
> that any disclosure, copying, distribution or taking any action in reliance
> on the contents of this information is strictly prohibited and may be
> unlawful. If you have received this communication in error, please notify us
> immediately by responding to this email and then delete it from your system.
> The firm is neither liable for the proper and complete transmission of the
> information contained in this communication nor for any delay in its
> receipt.
>

Vertical partitioning is just as practical in a traditional RDBMS as it
would be in hive. Normally you would do it for a few reasons:
1) You have some rarely used columns and you want to reduce the table/row
size
2) Your DBMS has terrible blob/clob/text support and the only want to get
large objects out of your way is to put them in other tables.

If you go the option of vertical partitioning in hive, you may have to join
to select the columns you need. I do not consider row serialization and de
serialization to be the majority of a hive job, and in most cases hadoop
handles 1 large file better then two smaller ones. Then again we have some
tables 140+ columns so i can see vertical partitioning helping with those
tables but it doubles the management.

Build failed in Hudson: Hive-trunk-h0.18 #475

2010-06-17 Thread Apache Hudson Server

See 

Changes:

[namit] HIVE-543. Add local mode execution in hive
(Joydeep Sen Sarma via namit)

[jvs] HIVE-1255. Add mathematical UDFs PI, E, degrees, radians, tan,
sign, and atan.  (Edward Capriolo via jvs)

--
[...truncated 13739 lines...]
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table src
[junit] POSTHOOK: Output: defa...@src
[junit] OK
[junit] Loading data to table src1
[junit] POSTHOOK: Output: defa...@src1
[junit] OK
[junit] Loading data to table src_sequencefile
[junit] POSTHOOK: Output: defa...@src_sequencefile
[junit] OK
[junit] Loading data to table src_thrift
[junit] POSTHOOK: Output: defa...@src_thrift
[junit] OK
[junit] Loading data to table src_json
[junit] POSTHOOK: Output: defa...@src_json
[junit] OK
[junit] diff 

 

[junit] Done query: unknown_function4.q
[junit] Begin query: unknown_table1.q
[junit] Loading data to table srcpart partition (ds=2008-04-08, hr=11)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=11
[junit] OK
[junit] Loading data to table srcpart partition (ds=2008-04-08, hr=12)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=12
[junit] OK
[junit] Loading data to table srcpart partition (ds=2008-04-09, hr=11)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=11
[junit] OK
[junit] Loading data to table srcpart partition (ds=2008-04-09, hr=12)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=12
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table src
[junit] POSTHOOK: Output: defa...@src
[junit] OK
[junit] Loading data to table src1
[junit] POSTHOOK: Output: defa...@src1
[junit] OK
[junit] Loading data to table src_sequencefile
[junit] POSTHOOK: Output: defa...@src_sequencefile
[junit] OK
[junit] Loading data to table src_thrift
[junit] POSTHOOK: Output: defa...@src_thrift
[junit] OK
[junit] Loading data to table src_json
[junit] POSTHOOK: Output: defa...@src_json
[junit] OK
[junit] diff 

 

[junit] Done query: unknown_table1.q
[junit] Begin query: unknown_table2.q
[junit] Loading data to table srcpart partition (ds=2008-04-08, hr=11)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=11
[junit] OK
[junit] Loading data to table srcpart partition (ds=2008-04-08, hr=12)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=12
[junit] OK
[junit] Loading data to table srcpart partition (ds=2008-04-09, hr=11)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=11
[junit] OK
[junit] Loading data to table srcpart partition (ds=2008-04-09, hr=12)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=12
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Ou

Re: Hive support to cassandra

2010-06-17 Thread Edward Capriolo

On Thu, Jun 17, 2010 at 12:14 AM, tom kersnick  wrote:

> You are not being rude Jeff.  This is a request from the client due to ease
> of use of Cassandra compared to Hbase.  I'm with you on this.  They are
> looking for apples to apples consistency.  Easy migration of data from OLTP
> (Cassandra) to their Data Warehouse (Cassandra?).  Apparently not.  Is it
> possible to migrate from Cassandra to Hbase?  Any documentation on this
> type
> of push to Hbase from Cassandra would be helpful.
>
> Thanks in advance.
>
> /tom
>
>
>
>
>
> On Wed, Jun 16, 2010 at 5:44 PM, Jeff Hammerbacher  >wrote:
>
> > Hey Tom,
> >
> > I don't want to be rude, but if you're using Cassandra for your data
> > warehouse environment, you're doing it wrong. HBase is the primary focus
> > for
> > integration with Hive (see
> > http://www.cloudera.com/blog/2010/06/integrating-hive-and-hbase/,
> > example). Cassandra is a great choice for an OLTP application, but
> > certainly
> > not for a data warehouse.
> >
> > Later,
> > Jeff
> >
> > On Wed, Jun 16, 2010 at 3:22 PM, tom kersnick 
> wrote:
> >
> > > Quick question for all of you.  Its seems that there is more movement
> > using
> > > Hive with Hbase rather than Cassandra.  Do you see this changing in the
> > > near
> > > future?  I have a client who is interested in using Cassandra due to
> the
> > > ease of maintenance.  They are planning on using Cassandra for both
> their
> > > data warehouse and OLTP environments.  Thoughts?
> > >
> > > I saw this ticket and I wanted to ask.
> > >
> > > Thanks in advance.
> > >
> > > /tom
> > >
> > >
> > > On Mon, May 3, 2010 at 12:42 PM, Edward Capriolo <
> edlinuxg...@gmail.com
> > > >wrote:
> > >
> > > > On Thu, Apr 8, 2010 at 1:17 PM, shirish 
> > > wrote:
> > > >
> > > > > > All,
> > > > > >
> > > > > > http://code.google.com/soc/.
> > > > > >
> > > > > > It is an interesting thing that Google offers stipends to get
> open
> > > > source
> > > > > > code written. However, last year I was was interested in a
> project
> > > that
> > > > > did
> > > > > > NOT get accepted into GSOC. It was quite deflating to be not
> > > > > > accepted/rejected.
> > > > > >
> > > > > > Money does make the world go around, and if we all had plenty of
> > > money
> > > > we
> > > > > > would all have more time to write open source code :) But on the
> > > chance
> > > > > > your
> > > > > > application does get rejected consider doing it anyway!
> > > > > >
> > > > > > Edward
> > > > > >
> > > > >
> > > > > Definitely Edward, Thanks for the suggestion :)
> > > > >
> > > > > shirish
> > > > >
> > > >
> > > > I did not see any cassandra or hive SOC projects
> > > >
> http://socghop.appspot.com/gsoc/program/list_projects/google/gsoc2010.
> > > :(
> > > > So
> > > > if no one is going to pick this cassandra interface up I will pick it
> > up
> > > > after I close some pending things that is two strikes for me and
> > > GSOC.
> > > >
> > >
> >
>

I am currently in the process of writing a cassandra storage handler to
match the Hbase one. Ill open a ticket for it. I was looking to tackle it
after the Hive Variables ticket I am working on.

Edward

Hudson build is back to normal : Hive-trunk-h0.17 #472

2010-06-17 Thread Apache Hudson Server

See

Re: Vertical partitioning

2010-06-17 Thread jaydeep vishwakarma


Just looking opportunity and feasibility for it. In one of my table have more 
than 20 fields where most of the time I need only 10 main fields. We rarely 
need other fields for day to day analysis.

Regards,
Jaydeep

Ning Zhang wrote:

Hive support columnar storage (RCFile) but not vertical partitioning. Is there 
any use case for vertical partitioning?

On Jun 16, 2010, at 6:41 AM, jaydeep vishwakarma wrote:



Hi,

Does hive support Vertical partitioning?

Regards,
Jaydeep



The information contained in this communication is intended solely for the use 
of the individual or entity to whom it is addressed and others authorized to 
receive it. It may contain confidential or legally privileged information. If 
you are not the intended recipient you are hereby notified that any disclosure, 
copying, distribution or taking any action in reliance on the contents of this 
information is strictly prohibited and may be unlawful. If you have received 
this communication in error, please notify us immediately by responding to this 
email and then delete it from your system. The firm is neither liable for the 
proper and complete transmission of the information contained in this 
communication nor for any delay in its receipt.








The information contained in this communication is intended solely for the use 
of the individual or entity to whom it is addressed and others authorized to 
receive it. It may contain confidential or legally privileged information. If 
you are not the intended recipient you are hereby notified that any disclosure, 
copying, distribution or taking any action in reliance on the contents of this 
information is strictly prohibited and may be unlawful. If you have received 
this communication in error, please notify us immediately by responding to this 
email and then delete it from your system. The firm is neither liable for the 
proper and complete transmission of the information contained in this 
communication nor for any delay in its receipt.

59 matches

Mail list logo