[jira] [Updated] (HIVE-2693) Add DECIMAL data type

2013-01-10 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-2693:
-

Attachment: HIVE-2693-21.patch

HIVE-2693-21.patch addresses Ashutosh' review.

- Fixes/adds comments
- Error message
- Additional conversion tests
- ASF headers


> Add DECIMAL data type
> -
>
> Key: HIVE-2693
> URL: https://issues.apache.org/jira/browse/HIVE-2693
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor, Types
>Affects Versions: 0.10.0
>Reporter: Carl Steinbach
>Assignee: Prasad Mujumdar
> Attachments: 2693_7.patch, 2693_8.patch, 2693_fix_all_tests1.patch, 
> HIVE-2693-10.patch, HIVE-2693-11.patch, HIVE-2693-12-SortableSerDe.patch, 
> HIVE-2693-13.patch, HIVE-2693-14.patch, HIVE-2693-15.patch, 
> HIVE-2693-16.patch, HIVE-2693-17.patch, HIVE-2693-18.patch, 
> HIVE-2693-19.patch, HIVE-2693-1.patch.txt, HIVE-2693-20.patch, 
> HIVE-2693-21.patch, HIVE-2693-all.patch, HIVE-2693.D7683.1.patch, 
> HIVE-2693-fix.patch, HIVE-2693.patch, HIVE-2693-take3.patch, 
> HIVE-2693-take4.patch
>
>
> Add support for the DECIMAL data type. HIVE-2272 (TIMESTAMP) provides a nice 
> template for how to do this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3277) Enable Metastore audit logging for non-secure connections

2013-01-10 Thread ransom.hezhiqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ransom.hezhiqiang updated HIVE-3277:


Fix Version/s: 0.10.0

> Enable Metastore audit logging for non-secure connections
> -
>
> Key: HIVE-3277
> URL: https://issues.apache.org/jira/browse/HIVE-3277
> Project: Hive
>  Issue Type: Improvement
>  Components: Logging, Metastore, Security
>Affects Versions: 0.10.0
>Reporter: Carl Steinbach
>Assignee: Sean Mackrory
> Fix For: 0.10.0
>
> Attachments: HIVE-3277.patch.1, HIVE-3277.patch.2
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3874) Create a new Optimized Row Columnar file format for Hive

2013-01-10 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13550898#comment-13550898
 ] 

Vinod Kumar Vavilapalli commented on HIVE-3874:
---

bq. Can the index be made optional ? In our typical use-case, the old data is 
hardly queried - so we are willing to trade off cpu, and not
support skipping rows for old data to save some space.
The way I understand it, index creation can be specified during creation, so it 
can be made optional. To start with, we may in fact have no indices and then 
add them later.

> Create a new Optimized Row Columnar file format for Hive
> 
>
> Key: HIVE-3874
> URL: https://issues.apache.org/jira/browse/HIVE-3874
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: OrcFileIntro.pptx
>
>
> There are several limitations of the current RC File format that I'd like to 
> address by creating a new format:
> * each column value is stored as a binary blob, which means:
> ** the entire column value must be read, decompressed, and deserialized
> ** the file format can't use smarter type-specific compression
> ** push down filters can't be evaluated
> * the start of each row group needs to be found by scanning
> * user metadata can only be added to the file when the file is created
> * the file doesn't store the number of rows per a file or row group
> * there is no mechanism for seeking to a particular row number, which is 
> required for external indexes.
> * there is no mechanism for storing light weight indexes within the file to 
> enable push-down filters to skip entire row groups.
> * the type of the rows aren't stored in the file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3884) Better align columns in DESCRIBE table_name output to make more human-readable

2013-01-10 Thread Dilip Joseph (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dilip Joseph updated HIVE-3884:
---

Fix Version/s: 0.11.0
 Release Note: Introduces new DESCRIBE PRETTY table_name command and  
hive.cli.pretty.output.num.cols conf parameter.
   Status: Patch Available  (was: Open)

Phabricator diff is at https://reviews.facebook.net/D7851

> Better align columns in DESCRIBE table_name output to make more human-readable
> --
>
> Key: HIVE-3884
> URL: https://issues.apache.org/jira/browse/HIVE-3884
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>Affects Versions: 0.9.0
>Reporter: Dilip Joseph
>Assignee: Dilip Joseph
>Priority: Minor
> Fix For: 0.11.0
>
> Attachments: describe_test_table.png, HIVE-3884.1.patch.txt
>
>
> If a table contains very long comments or very long column names, the output 
> of DESCRIBE table_name is not aligned nicely.  The attached screenshot shows 
> the following two problems:
> 1. Rows with long column names do not align well with other columns.
> 2. Rows with long comments wrap to the next line, and make it hard to read 
> the output.  The wrapping behavior depends on the width of the user's 
> terminal width.
> It will be nice to have a DESCRIBE PRETTY table_name command that will 
> produce nicely formatted output that avoids the two problems mentioned above. 
>  It is better to introduce a new DESCRIBE PRETTY command rather than change 
> the behavior of the existing DESCRIBE or DESCRIBE FORMATTED commands, so that 
> we avoid breaking any scripts that automatically parse the output.
> Since the pretty formatting depends on the current terminal width, we need a 
> new hive conf parameter to tell the CLI to auto-detect the current terminal 
> width or to use a fixed width (needed for unit tests).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3884) Better align columns in DESCRIBE table_name output to make more human-readable

2013-01-10 Thread Dilip Joseph (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dilip Joseph updated HIVE-3884:
---

Attachment: HIVE-3884.1.patch.txt

> Better align columns in DESCRIBE table_name output to make more human-readable
> --
>
> Key: HIVE-3884
> URL: https://issues.apache.org/jira/browse/HIVE-3884
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>Affects Versions: 0.9.0
>Reporter: Dilip Joseph
>Assignee: Dilip Joseph
>Priority: Minor
> Attachments: describe_test_table.png, HIVE-3884.1.patch.txt
>
>
> If a table contains very long comments or very long column names, the output 
> of DESCRIBE table_name is not aligned nicely.  The attached screenshot shows 
> the following two problems:
> 1. Rows with long column names do not align well with other columns.
> 2. Rows with long comments wrap to the next line, and make it hard to read 
> the output.  The wrapping behavior depends on the width of the user's 
> terminal width.
> It will be nice to have a DESCRIBE PRETTY table_name command that will 
> produce nicely formatted output that avoids the two problems mentioned above. 
>  It is better to introduce a new DESCRIBE PRETTY command rather than change 
> the behavior of the existing DESCRIBE or DESCRIBE FORMATTED commands, so that 
> we avoid breaking any scripts that automatically parse the output.
> Since the pretty formatting depends on the current terminal width, we need a 
> new hive conf parameter to tell the CLI to auto-detect the current terminal 
> width or to use a fixed width (needed for unit tests).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3888) wrong mapside groupby if no partition is being selected

2013-01-10 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13550891#comment-13550891
 ] 

Ashutosh Chauhan commented on HIVE-3888:


+1 please commit if tests pass.

> wrong mapside groupby if no partition is being selected
> ---
>
> Key: HIVE-3888
> URL: https://issues.apache.org/jira/browse/HIVE-3888
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3888.1.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3882) config for enabling HIVE-2723

2013-01-10 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13550887#comment-13550887
 ] 

Namit Jain commented on HIVE-3882:
--

OK

> config for enabling HIVE-2723
> -
>
> Key: HIVE-3882
> URL: https://issues.apache.org/jira/browse/HIVE-3882
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Pamela Vagata
> Attachments: HIVE-3882.patch.1.txt
>
>
> Since it a backward incompatible change, it might be a good idea to make it
> configurable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3888) wrong mapside groupby if no partition is being selected

2013-01-10 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3888:
-

Attachment: hive.3888.1.patch

> wrong mapside groupby if no partition is being selected
> ---
>
> Key: HIVE-3888
> URL: https://issues.apache.org/jira/browse/HIVE-3888
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3888.1.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3888) wrong mapside groupby if no partition is being selected

2013-01-10 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13550886#comment-13550886
 ] 

Namit Jain commented on HIVE-3888:
--

https://reviews.facebook.net/D7845

> wrong mapside groupby if no partition is being selected
> ---
>
> Key: HIVE-3888
> URL: https://issues.apache.org/jira/browse/HIVE-3888
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3888.1.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-3888) wrong mapside groupby if no partition is being selected

2013-01-10 Thread Namit Jain (JIRA)
Namit Jain created HIVE-3888:


 Summary: wrong mapside groupby if no partition is being selected
 Key: HIVE-3888
 URL: https://issues.apache.org/jira/browse/HIVE-3888
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3025) Fix Hive ARCHIVE command on 0.22 and 0.23

2013-01-10 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13550880#comment-13550880
 ] 

Ashutosh Chauhan commented on HIVE-3025:


[~vikram.dixit] I forgot what was the resolution of your investigation on this?

> Fix Hive ARCHIVE command on 0.22 and 0.23
> -
>
> Key: HIVE-3025
> URL: https://issues.apache.org/jira/browse/HIVE-3025
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.9.0
>Reporter: Carl Steinbach
>Assignee: Carl Steinbach
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-3025.D3195.1.patch
>
>
> archive.q and archive_multi.q fail when Hive is run on top of Hadoop 0.22 or 
> 0.23.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3537) release locks at the end of move tasks

2013-01-10 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13550877#comment-13550877
 ] 

Ashutosh Chauhan commented on HIVE-3537:


Also, seems like it won't be easy to write a test case for this. One 
possibility I can think of is to write a junit test-case (instead of .q) which 
tests for the lock data in ZK and asserts that lock data is wiped out of ZK, 
after query doing multi-insert finishes. 

> release locks at the end of move tasks
> --
>
> Key: HIVE-3537
> URL: https://issues.apache.org/jira/browse/HIVE-3537
> Project: Hive
>  Issue Type: Bug
>  Components: Locking, Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3537.1.patch, hive.3537.2.patch, hive.3537.3.patch, 
> hive.3537.4.patch, hive.3537.5.patch, hive.3537.6.patch, hive.3537.7.patch, 
> hive.3537.8.patch, hive.3537.9.patch
>
>
> Look at HIVE-3106 for details.
> In order to make sure that concurrency is not an issue for multi-table 
> inserts, the current option is to introduce a dependency task, which thereby
> delays the creation of all partitions. It would be desirable to release the
> locks for the outputs as soon as the move task is completed. That way, for
> multi-table inserts, the concurrency can be enabled without delaying any 
> table.
> Currently, the movetask contains a input/output, but they do not seem to be
> populated correctly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3873) lot of tests failing for hadoop 23

2013-01-10 Thread Prasad Mujumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13550875#comment-13550875
 ] 

Prasad Mujumdar commented on HIVE-3873:
---

Looks like there's already a jira HIVE-3025 for the archive tests.

> lot of tests failing for hadoop 23
> --
>
> Key: HIVE-3873
> URL: https://issues.apache.org/jira/browse/HIVE-3873
> Project: Hive
>  Issue Type: Bug
>Reporter: Namit Jain
>Assignee: Gang Tim Liu
>
> The following tests are failing on hadoop 23:
> [junit] Failed query: archive_excludeHadoop20.q
> [junit] Failed query: archive_multi.q
> [junit] Failed query: index_bitmap.q
> [junit] Failed query: join_filters_overlap.q
> [junit] Failed query: join_nullsafe.q
> [junit] Failed query: list_bucket_dml_6.q
> [junit] Failed query: list_bucket_dml_7.q
> [junit] Failed query: list_bucket_dml_8.q
> [junit] Failed query: list_bucket_query_oneskew_3.q
> [junit] Failed query: parenthesis_star_by.q
> [junit] Failed query: recursive_dir.q
> Some of them may be log updates.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3824) bug if different serdes are used for different partitions

2013-01-10 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13550871#comment-13550871
 ] 

Ashutosh Chauhan commented on HIVE-3824:


+1

> bug if different serdes are used for different partitions
> -
>
> Key: HIVE-3824
> URL: https://issues.apache.org/jira/browse/HIVE-3824
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3824.1.patch, hive.3824.3.patch
>
>
> Consider the following testcase:
> create table tst5 (key string, value string) partitioned by (ds string) 
> stored as rcfile;
> insert overwrite table tst5 partition (ds='1') select * from src;
> insert overwrite table tst5 partition (ds='2') select * from src;
> insert overwrite table tst5 partition (ds='3') select * from src;
> alter table tst5 stored as sequencefile; 
> insert overwrite table tst5 partition (ds='4') select * from src;
> insert overwrite table tst5 partition (ds='5') select * from src;
> insert overwrite table tst5 partition (ds='6') select * from src;  
> alter table tst5 set serde 
> 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'; 
> insert overwrite table tst5 partition (ds='7') select * from src;
> insert overwrite table tst5 partition (ds='8') select * from src;
> insert overwrite table tst5 partition (ds='9') select * from src;  
> The following query works fine:
>  select key + key, value from tst5 where ((ds = '4') or (ds = '1'));   
> since both the partitions use ColumnarSerDe
> But the following query fails:
> select key + key, value from tst5 where ((ds = '4') or (ds = '1') or 
> (ds='7'));
> since different serdes are used.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3537) release locks at the end of move tasks

2013-01-10 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13550862#comment-13550862
 ] 

Ashutosh Chauhan commented on HIVE-3537:


Namit,
Will this result in fix introduced in HIVE-3106 redundant, i.e., dependency 
task (and thus config variable) introduced will no longer be required. If not, 
than config variable should always be true, isn't it for concurrency to work 
correctly?

> release locks at the end of move tasks
> --
>
> Key: HIVE-3537
> URL: https://issues.apache.org/jira/browse/HIVE-3537
> Project: Hive
>  Issue Type: Bug
>  Components: Locking, Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3537.1.patch, hive.3537.2.patch, hive.3537.3.patch, 
> hive.3537.4.patch, hive.3537.5.patch, hive.3537.6.patch, hive.3537.7.patch, 
> hive.3537.8.patch, hive.3537.9.patch
>
>
> Look at HIVE-3106 for details.
> In order to make sure that concurrency is not an issue for multi-table 
> inserts, the current option is to introduce a dependency task, which thereby
> delays the creation of all partitions. It would be desirable to release the
> locks for the outputs as soon as the move task is completed. That way, for
> multi-table inserts, the concurrency can be enabled without delaying any 
> table.
> Currently, the movetask contains a input/output, but they do not seem to be
> populated correctly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HIVE-3884) Better align columns in DESCRIBE table_name output to make more human-readable

2013-01-10 Thread Dilip Joseph (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dilip Joseph reassigned HIVE-3884:
--

Assignee: Dilip Joseph  (was: Gang Tim Liu)

> Better align columns in DESCRIBE table_name output to make more human-readable
> --
>
> Key: HIVE-3884
> URL: https://issues.apache.org/jira/browse/HIVE-3884
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>Affects Versions: 0.9.0
>Reporter: Dilip Joseph
>Assignee: Dilip Joseph
>Priority: Minor
> Attachments: describe_test_table.png
>
>
> If a table contains very long comments or very long column names, the output 
> of DESCRIBE table_name is not aligned nicely.  The attached screenshot shows 
> the following two problems:
> 1. Rows with long column names do not align well with other columns.
> 2. Rows with long comments wrap to the next line, and make it hard to read 
> the output.  The wrapping behavior depends on the width of the user's 
> terminal width.
> It will be nice to have a DESCRIBE PRETTY table_name command that will 
> produce nicely formatted output that avoids the two problems mentioned above. 
>  It is better to introduce a new DESCRIBE PRETTY command rather than change 
> the behavior of the existing DESCRIBE or DESCRIBE FORMATTED commands, so that 
> we avoid breaking any scripts that automatically parse the output.
> Since the pretty formatting depends on the current terminal width, we need a 
> new hive conf parameter to tell the CLI to auto-detect the current terminal 
> width or to use a fixed width (needed for unit tests).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2439) Upgrade antlr version to 3.4

2013-01-10 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13550846#comment-13550846
 ] 

Ashutosh Chauhan commented on HIVE-2439:


Comments on https://reviews.facebook.net/D7527

> Upgrade antlr version to 3.4
> 
>
> Key: HIVE-2439
> URL: https://issues.apache.org/jira/browse/HIVE-2439
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 0.8.0
>Reporter: Ashutosh Chauhan
>Assignee: Thiruvel Thirumoolan
> Attachments: HIVE-2439_branch9_2.patch, HIVE-2439_branch9_3.patch, 
> HIVE-2439_branch9.patch, hive-2439_incomplete.patch, HIVE-2439_trunk.patch
>
>
> Upgrade antlr version to 3.4

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2439) Upgrade antlr version to 3.4

2013-01-10 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13550828#comment-13550828
 ] 

Ashutosh Chauhan commented on HIVE-2439:


Yeah, this will make life of HCatalog users much easier. I will take a look at 
it soon.

> Upgrade antlr version to 3.4
> 
>
> Key: HIVE-2439
> URL: https://issues.apache.org/jira/browse/HIVE-2439
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 0.8.0
>Reporter: Ashutosh Chauhan
>Assignee: Thiruvel Thirumoolan
> Attachments: HIVE-2439_branch9_2.patch, HIVE-2439_branch9_3.patch, 
> HIVE-2439_branch9.patch, hive-2439_incomplete.patch, HIVE-2439_trunk.patch
>
>
> Upgrade antlr version to 3.4

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2935) Implement HiveServer2

2013-01-10 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13550826#comment-13550826
 ] 

Ashutosh Chauhan commented on HIVE-2935:


bq. would it be possible for you to take a look at HIVE-3785 ?

Surely, I will take a look.

> Implement HiveServer2
> -
>
> Key: HIVE-2935
> URL: https://issues.apache.org/jira/browse/HIVE-2935
> Project: Hive
>  Issue Type: New Feature
>  Components: Server Infrastructure
>Reporter: Carl Steinbach
>Assignee: Carl Steinbach
>  Labels: HiveServer2
> Attachments: beelinepositive.tar.gz, HIVE-2935.1.notest.patch.txt, 
> HIVE-2935.2.notest.patch.txt, HIVE-2935.2.nothrift.patch.txt, 
> HIVE-2935.3.patch.gz, HS2-changed-files-only.patch, 
> HS2-with-thrift-patch-rebased.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3882) config for enabling HIVE-2723

2013-01-10 Thread Pamela Vagata (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13550809#comment-13550809
 ] 

Pamela Vagata commented on HIVE-3882:
-

We may want to hold off on checking this in - also I missed a spot - We've
been talking about committing this as an internal only flag and then
reverting it once the change has had some time to bake.

Pam





> config for enabling HIVE-2723
> -
>
> Key: HIVE-3882
> URL: https://issues.apache.org/jira/browse/HIVE-3882
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Pamela Vagata
> Attachments: HIVE-3882.patch.1.txt
>
>
> Since it a backward incompatible change, it might be a good idea to make it
> configurable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3882) config for enabling HIVE-2723

2013-01-10 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13550801#comment-13550801
 ] 

Namit Jain commented on HIVE-3882:
--

+1

running tests

> config for enabling HIVE-2723
> -
>
> Key: HIVE-3882
> URL: https://issues.apache.org/jira/browse/HIVE-3882
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Pamela Vagata
> Attachments: HIVE-3882.patch.1.txt
>
>
> Since it a backward incompatible change, it might be a good idea to make it
> configurable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3875) negative value for hive.stats.ndv.error should be disallowed

2013-01-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13550797#comment-13550797
 ] 

Hudson commented on HIVE-3875:
--

Integrated in hive-trunk-hadoop1 #6 (See 
[https://builds.apache.org/job/hive-trunk-hadoop1/6/])
HIVE-3875. Negative value for hive.stats.ndv.error should be disallowed 
(Shreepadma Venugopalan via cws) (Revision 1431793)

 Result = ABORTED
cws : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1431793
Files : 
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnStatsSemanticAnalyzer.java


> negative value for hive.stats.ndv.error should be disallowed 
> -
>
> Key: HIVE-3875
> URL: https://issues.apache.org/jira/browse/HIVE-3875
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Affects Versions: 0.10.0
>Reporter: Shreepadma Venugopalan
>Assignee: Shreepadma Venugopalan
> Fix For: 0.11.0
>
> Attachments: HIVE-3875.1.patch.txt
>
>
> Currently, if a negative value is specified for hive.stats.ndv.error in 
> hive-site.xml, it is treated as 0. We should instead throw an exception.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Hive-trunk-hadoop2 - Build # 57 - Failure

2013-01-10 Thread Apache Jenkins Server


38 tests failed.
FAILED:  
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_external_table_ppd

Error Message:
Unexpected exception in setup

Stack Trace:
junit.framework.AssertionFailedError: Unexpected exception in setup
at junit.framework.Assert.fail(Assert.java:50)
at 
org.apache.hadoop.hive.cli.TestHBaseCliDriver.setUp(TestHBaseCliDriver.java:59)
at junit.framework.TestCase.runBare(TestCase.java:132)
at junit.framework.TestResult$1.protect(TestResult.java:110)
at junit.framework.TestResult.runProtected(TestResult.java:128)
at junit.framework.TestResult.run(TestResult.java:113)
at junit.framework.TestCase.run(TestCase.java:124)
at junit.framework.TestSuite.runTest(TestSuite.java:243)
at junit.framework.TestSuite.run(TestSuite.java:238)
at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24)
at junit.extensions.TestSetup$1.protect(TestSetup.java:23)
at junit.framework.TestResult.runProtected(TestResult.java:128)
at junit.extensions.TestSetup.run(TestSetup.java:27)
at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:518)
at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1052)
at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:906)


FAILED:  
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_external_table_queries

Error Message:
Unexpected exception in setup

Stack Trace:
junit.framework.AssertionFailedError: Unexpected exception in setup
at junit.framework.Assert.fail(Assert.java:50)
at 
org.apache.hadoop.hive.cli.TestHBaseCliDriver.setUp(TestHBaseCliDriver.java:59)
at junit.framework.TestCase.runBare(TestCase.java:132)
at junit.framework.TestResult$1.protect(TestResult.java:110)
at junit.framework.TestResult.runProtected(TestResult.java:128)
at junit.framework.TestResult.run(TestResult.java:113)
at junit.framework.TestCase.run(TestCase.java:124)
at junit.framework.TestSuite.runTest(TestSuite.java:243)
at junit.framework.TestSuite.run(TestSuite.java:238)
at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24)
at junit.extensions.TestSetup$1.protect(TestSetup.java:23)
at junit.framework.TestResult.runProtected(TestResult.java:128)
at junit.extensions.TestSetup.run(TestSetup.java:27)
at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:518)
at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1052)
at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:906)


FAILED:  
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_map_queries

Error Message:
Unexpected exception in setup

Stack Trace:
junit.framework.AssertionFailedError: Unexpected exception in setup
at junit.framework.Assert.fail(Assert.java:50)
at 
org.apache.hadoop.hive.cli.TestHBaseCliDriver.setUp(TestHBaseCliDriver.java:59)
at junit.framework.TestCase.runBare(TestCase.java:132)
at junit.framework.TestResult$1.protect(TestResult.java:110)
at junit.framework.TestResult.runProtected(TestResult.java:128)
at junit.framework.TestResult.run(TestResult.java:113)
at junit.framework.TestCase.run(TestCase.java:124)
at junit.framework.TestSuite.runTest(TestSuite.java:243)
at junit.framework.TestSuite.run(TestSuite.java:238)
at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24)
at junit.extensions.TestSetup$1.protect(TestSetup.java:23)
at junit.framework.TestResult.runProtected(TestResult.java:128)
at junit.extensions.TestSetup.run(TestSetup.java:27)
at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:518)
at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1052)
at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:906)


FAILED:  
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_storage_queries

Error Message:
Unexpected exception in setup

Stack Trace:
junit.framework.AssertionFailedError: Unexpected exception in setup
at junit.framework.Assert.fail(Assert.java:50)
at 
org.apache.hadoop.hive.cli.TestHBaseCliDriver.setUp(TestHBaseCliDriver.java:59)
at junit.framework.TestCase.runBare(TestCase.java:132)
at junit.framework.TestResult$1.protect(TestResult.java:110)
at junit.framework.TestResult.runProtected(TestResult.java:128)
at junit.framework.TestResult.run(TestResult.java:113)
at junit.framework.TestCase.run(TestCase.java:124)
at junit.framework.TestSuite.ru

[jira] [Created] (HIVE-3887) Upgrade Hive's Avro dependency to version 1.7.3

2013-01-10 Thread Shreepadma Venugopalan (JIRA)
Shreepadma Venugopalan created HIVE-3887:


 Summary: Upgrade Hive's Avro dependency to version 1.7.3
 Key: HIVE-3887
 URL: https://issues.apache.org/jira/browse/HIVE-3887
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.11.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-3886) WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated

2013-01-10 Thread Shreepadma Venugopalan (JIRA)
Shreepadma Venugopalan created HIVE-3886:


 Summary: WARNING: org.apache.hadoop.metrics.jvm.EventCounter is 
deprecated
 Key: HIVE-3886
 URL: https://issues.apache.org/jira/browse/HIVE-3886
 Project: Hive
  Issue Type: Bug
  Components: Configuration
Affects Versions: 0.9.0, 0.10.0, 0.11.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
Priority: Minor


WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use 
org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3875) negative value for hive.stats.ndv.error should be disallowed

2013-01-10 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-3875:
-

   Resolution: Fixed
Fix Version/s: 0.11.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks Shreepadma!

> negative value for hive.stats.ndv.error should be disallowed 
> -
>
> Key: HIVE-3875
> URL: https://issues.apache.org/jira/browse/HIVE-3875
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Affects Versions: 0.10.0
>Reporter: Shreepadma Venugopalan
>Assignee: Shreepadma Venugopalan
> Fix For: 0.11.0
>
> Attachments: HIVE-3875.1.patch.txt
>
>
> Currently, if a negative value is specified for hive.stats.ndv.error in 
> hive-site.xml, it is treated as 0. We should instead throw an exception.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3875) negative value for hive.stats.ndv.error should be disallowed

2013-01-10 Thread Shreepadma Venugopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13550693#comment-13550693
 ] 

Shreepadma Venugopalan commented on HIVE-3875:
--

Thanks Carl for committing.

> negative value for hive.stats.ndv.error should be disallowed 
> -
>
> Key: HIVE-3875
> URL: https://issues.apache.org/jira/browse/HIVE-3875
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Affects Versions: 0.10.0
>Reporter: Shreepadma Venugopalan
>Assignee: Shreepadma Venugopalan
> Fix For: 0.11.0
>
> Attachments: HIVE-3875.1.patch.txt
>
>
> Currently, if a negative value is specified for hive.stats.ndv.error in 
> hive-site.xml, it is treated as 0. We should instead throw an exception.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: unicode character as delimiter

2013-01-10 Thread Ho Kenneth - kennho
I'd appreicate if someone can help out with this issue. Tons of thanks!  :)

I have tried many different combinations but still not able to get it to
work.

Q: how do we parse delimiter - "þ"



On 1/10/13 8:08 AM, "Ho Kenneth - kennho"  wrote:

>Thanks for the quick response.
>
>I try '\376', but still not working  :(
>
>
>
>On 1/10/13 6:23 AM, "Dean Wampler" 
>wrote:
>
>>You have to use the octal representation, e.g., ^A is \001.
>>
>>On Wed, Jan 9, 2013 at 8:32 PM, Ho Kenneth - kennho
>>wrote:
>>
>>> Hi all,
>>>
>>> I have an input file that has a unicode character as a delimiter, which
>>>is
>>> þ  (thorn)
>>>
>>> For example:
>>>
>>> col1þcol2þcol3
>>>
>>>   Þ has a value of UTF-8(hex) 0xC3 0xBE (c3be)
>>>
>>> And I have tried the following but no luck:
>>> create table test(col1 string, col2 string, col3 string) row format
>>> delimited fields terminated by '\c3be';
>>>
>>> I'd appreciate your help! Thanks in advance.
>>>
>>> --ken
>>>
>>>
>>>
>>> 
>>>
>>>*
>>>**
>>> The information contained in this communication is confidential, is
>>> intended only for the use of the recipient named above, and may be
>>>legally
>>> privileged.
>>>
>>> If the reader of this message is not the intended recipient, you are
>>> hereby notified that any dissemination, distribution or copying of this
>>> communication is strictly prohibited.
>>>
>>> If you have received this communication in error, please resend this
>>> communication to the sender and delete the original message or any copy
>>> of it from your computer system.
>>>
>>> Thank You.
>>>
>>> 
>>>
>>>*
>>>***
>>>
>>
>>
>>
>>-- 
>>*Dean Wampler, Ph.D.*
>>thinkbiganalytics.com
>>+1-312-339-1330
>



[jira] [Closed] (HIVE-3689) Update website with info on how to report security bugs

2013-01-10 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan closed HIVE-3689.
--


> Update website with info on how to report security bugs 
> 
>
> Key: HIVE-3689
> URL: https://issues.apache.org/jira/browse/HIVE-3689
> Project: Hive
>  Issue Type: Task
>  Components: Documentation
>Reporter: Eli Collins
>Assignee: Ashutosh Chauhan
> Fix For: 0.10.0
>
>
> The Hive website should be updated with information on how to report 
> potential security vulnerabilities. In Hadoop we have a private security list 
> that anyone case post to that we point to on our mailing list page: 
> http://hadoop.apache.org/general_lists.html#Security.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HIVE-3689) Update website with info on how to report security bugs

2013-01-10 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan resolved HIVE-3689.


   Resolution: Fixed
Fix Version/s: 0.10.0

Updated the site with the info.

> Update website with info on how to report security bugs 
> 
>
> Key: HIVE-3689
> URL: https://issues.apache.org/jira/browse/HIVE-3689
> Project: Hive
>  Issue Type: Task
>  Components: Documentation
>Reporter: Eli Collins
>Assignee: Ashutosh Chauhan
> Fix For: 0.10.0
>
>
> The Hive website should be updated with information on how to report 
> potential security vulnerabilities. In Hadoop we have a private security list 
> that anyone case post to that we point to on our mailing list page: 
> http://hadoop.apache.org/general_lists.html#Security.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Closed] (HIVE-1399) Nested UDAFs cause Hive Internal Error (NullPointerException)

2013-01-10 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan closed HIVE-1399.
--


> Nested UDAFs cause Hive Internal Error (NullPointerException)
> -
>
> Key: HIVE-1399
> URL: https://issues.apache.org/jira/browse/HIVE-1399
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.9.0
>Reporter: Mayank Lahiri
>Assignee: Adam Fokken
> Fix For: 0.10.0
>
> Attachments: HIVE-1399.1.patch.txt
>
>
> This query does not make "real-world" sense, and I'm guessing it's not even 
> supported by HQL/SQL, but I'm pretty sure that it shouldn't be causing an 
> internal error with a NullPointerException. "normal" just has one column 
> called "val". I'm running on trunk, svn updated 5 minutes ago, ant clean 
> package.
> SELECT percentile(val, percentile(val, 0.5)) FROM normal;
> FAILED: Hive Internal Error: java.lang.NullPointerException(null)
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc.newInstance(ExprNodeGenericFuncDesc.java:153)
>   at 
> org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.getXpathOrFuncExprNodeDesc(TypeCheckProcFactory.java:587)
>   at 
> org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:708)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:88)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:128)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:102)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:6241)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapGroupByOperator(SemanticAnalyzer.java:2301)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapAggr1MR(SemanticAnalyzer.java:2860)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:5002)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:5524)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:6055)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:126)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:304)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:377)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:138)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:303)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> I've also recreated this error with a GenericUDAF I'm writing, and also with 
> the following:
> SELECT percentile(val, percentile()) FROM normal;   
> SELECT avg(variance(dob_year)) FROM somedata; // this makes no sense, but 
> still a NullPointerException

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-1399) Nested UDAFs cause Hive Internal Error (NullPointerException)

2013-01-10 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-1399:
---

   Resolution: Fixed
Fix Version/s: 0.10.0
   Status: Resolved  (was: Patch Available)

Fixed via HIVE-2956

> Nested UDAFs cause Hive Internal Error (NullPointerException)
> -
>
> Key: HIVE-1399
> URL: https://issues.apache.org/jira/browse/HIVE-1399
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.9.0
>Reporter: Mayank Lahiri
>Assignee: Adam Fokken
> Fix For: 0.10.0
>
> Attachments: HIVE-1399.1.patch.txt
>
>
> This query does not make "real-world" sense, and I'm guessing it's not even 
> supported by HQL/SQL, but I'm pretty sure that it shouldn't be causing an 
> internal error with a NullPointerException. "normal" just has one column 
> called "val". I'm running on trunk, svn updated 5 minutes ago, ant clean 
> package.
> SELECT percentile(val, percentile(val, 0.5)) FROM normal;
> FAILED: Hive Internal Error: java.lang.NullPointerException(null)
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc.newInstance(ExprNodeGenericFuncDesc.java:153)
>   at 
> org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.getXpathOrFuncExprNodeDesc(TypeCheckProcFactory.java:587)
>   at 
> org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:708)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:88)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:128)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:102)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:6241)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapGroupByOperator(SemanticAnalyzer.java:2301)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapAggr1MR(SemanticAnalyzer.java:2860)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:5002)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:5524)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:6055)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:126)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:304)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:377)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:138)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:303)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> I've also recreated this error with a GenericUDAF I'm writing, and also with 
> the following:
> SELECT percentile(val, percentile()) FROM normal;   
> SELECT avg(variance(dob_year)) FROM somedata; // this makes no sense, but 
> still a NullPointerException

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-933) Infer bucketing/sorting properties

2013-01-10 Thread Kevin Wilfong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13550592#comment-13550592
 ] 

Kevin Wilfong commented on HIVE-933:


Updated, and responded to comments.  All tests pass.

> Infer bucketing/sorting properties
> --
>
> Key: HIVE-933
> URL: https://issues.apache.org/jira/browse/HIVE-933
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Kevin Wilfong
> Attachments: HIVE-933.1.patch.txt, HIVE-933.2.patch.txt, 
> HIVE-933.3.patch.txt, HIVE-933.4.patch.txt, HIVE-933.5.patch.txt, 
> HIVE-933.6.patch.txt, HIVE-933.7.patch.txt, HIVE-933.8.patch.txt, 
> HIVE-933.9.patch.txt
>
>
> This is a long-term plan, and may require major changes.
> From the query, we can figure out the sorting/bucketing properties, and 
> change the metadata of the destination at that time.
> However, this means that different partitions may have different metadata. 
> Currently, the query plan is same for all the 
> partitions of the table - we can do the following:
> 1. In the first cut, have a simple approach where you take the union all 
> metadata, and create the most defensive plan.
> 2. Enhance mapredWork() to include partition specific operator trees.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-933) Infer bucketing/sorting properties

2013-01-10 Thread Kevin Wilfong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Wilfong updated HIVE-933:
---

Status: Patch Available  (was: Open)

> Infer bucketing/sorting properties
> --
>
> Key: HIVE-933
> URL: https://issues.apache.org/jira/browse/HIVE-933
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Kevin Wilfong
> Attachments: HIVE-933.1.patch.txt, HIVE-933.2.patch.txt, 
> HIVE-933.3.patch.txt, HIVE-933.4.patch.txt, HIVE-933.5.patch.txt, 
> HIVE-933.6.patch.txt, HIVE-933.7.patch.txt, HIVE-933.8.patch.txt, 
> HIVE-933.9.patch.txt
>
>
> This is a long-term plan, and may require major changes.
> From the query, we can figure out the sorting/bucketing properties, and 
> change the metadata of the destination at that time.
> However, this means that different partitions may have different metadata. 
> Currently, the query plan is same for all the 
> partitions of the table - we can do the following:
> 1. In the first cut, have a simple approach where you take the union all 
> metadata, and create the most defensive plan.
> 2. Enhance mapredWork() to include partition specific operator trees.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-933) Infer bucketing/sorting properties

2013-01-10 Thread Kevin Wilfong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Wilfong updated HIVE-933:
---

Attachment: HIVE-933.9.patch.txt

> Infer bucketing/sorting properties
> --
>
> Key: HIVE-933
> URL: https://issues.apache.org/jira/browse/HIVE-933
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Kevin Wilfong
> Attachments: HIVE-933.1.patch.txt, HIVE-933.2.patch.txt, 
> HIVE-933.3.patch.txt, HIVE-933.4.patch.txt, HIVE-933.5.patch.txt, 
> HIVE-933.6.patch.txt, HIVE-933.7.patch.txt, HIVE-933.8.patch.txt, 
> HIVE-933.9.patch.txt
>
>
> This is a long-term plan, and may require major changes.
> From the query, we can figure out the sorting/bucketing properties, and 
> change the metadata of the destination at that time.
> However, this means that different partitions may have different metadata. 
> Currently, the query plan is same for all the 
> partitions of the table - we can do the following:
> 1. In the first cut, have a simple approach where you take the union all 
> metadata, and create the most defensive plan.
> 2. Enhance mapredWork() to include partition specific operator trees.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-1399) Nested UDAFs cause Hive Internal Error (NullPointerException)

2013-01-10 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13550579#comment-13550579
 ] 

Brock Noland commented on HIVE-1399:


Hi,

I think we can close this issue and link it to HIVE-2956. HIVE-2956 has 
resolved this issue.

For example, the below output comes from a trunk build:

{noformat}
hive> select avg(avg(key)) from src;  
FAILED: SemanticException [Error 10128]: Line 1:11 Not yet supported place for 
UDAF 'avg'
{noformat}

> Nested UDAFs cause Hive Internal Error (NullPointerException)
> -
>
> Key: HIVE-1399
> URL: https://issues.apache.org/jira/browse/HIVE-1399
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.9.0
>Reporter: Mayank Lahiri
>Assignee: Adam Fokken
> Attachments: HIVE-1399.1.patch.txt
>
>
> This query does not make "real-world" sense, and I'm guessing it's not even 
> supported by HQL/SQL, but I'm pretty sure that it shouldn't be causing an 
> internal error with a NullPointerException. "normal" just has one column 
> called "val". I'm running on trunk, svn updated 5 minutes ago, ant clean 
> package.
> SELECT percentile(val, percentile(val, 0.5)) FROM normal;
> FAILED: Hive Internal Error: java.lang.NullPointerException(null)
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc.newInstance(ExprNodeGenericFuncDesc.java:153)
>   at 
> org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.getXpathOrFuncExprNodeDesc(TypeCheckProcFactory.java:587)
>   at 
> org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:708)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:88)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:128)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:102)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:6241)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapGroupByOperator(SemanticAnalyzer.java:2301)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapAggr1MR(SemanticAnalyzer.java:2860)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:5002)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:5524)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:6055)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:126)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:304)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:377)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:138)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:303)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> I've also recreated this error with a GenericUDAF I'm writing, and also with 
> the following:
> SELECT percentile(val, percentile()) FROM normal;   
> SELECT avg(variance(dob_year)) FROM somedata; // this makes no sense, but 
> still a NullPointerException

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HIVE-3859) Expand UDF method selection logic in Function Registry

2013-01-10 Thread Mark Grover (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Grover resolved HIVE-3859.
---

Resolution: Invalid

It's non-trivial to separate out the UDF method selection part of the patch 
(see Gunther's comment regarding it for more details). The code for this patch 
already exists in the parent JIRA's patch (HIVE-2693) so this JIRA no longer 
applies.

> Expand UDF method selection logic in Function Registry
> --
>
> Key: HIVE-3859
> URL: https://issues.apache.org/jira/browse/HIVE-3859
> Project: Hive
>  Issue Type: Sub-task
>  Components: UDF
>Affects Versions: 0.9.0
>Reporter: Mark Grover
>
> Presently, the Function Registry uses a constant cost of conversion to decide 
> which method to call. This has worked well until now but wouldn't work when 
> Decimal types get introduced (HIVE-2693). The reason being that a double and 
> a decimal may be good candidates for conversion of an int but with the 
> present costing strategy, they both will have the same cost of conversion. 
> This leads to ambiguity when Hive tries to decide which method in the UDF to 
> call if the UDF implements methods for both double and decimal.
> This needs to get resolved before the decimal code gets committed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Review Request: HIVE-2206: add a new optimizer for query correlation discovery and optimization

2013-01-10 Thread Yin Huai


> On Jan. 10, 2013, 2:24 a.m., Ashutosh Chauhan wrote:
> > ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java,
> >  line 32
> > 
> >
> > I don't see any modifications to ql/if/queryplan.thrift Please mod that 
> > file appropriately and add the generated code  in the patch

done


> On Jan. 10, 2013, 2:24 a.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/BaseReduceSinkOperator.java, 
> > line 36
> > 
> >
> > Unused import.

done


> On Jan. 10, 2013, 2:24 a.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/BaseReduceSinkOperator.java, 
> > line 47
> > 
> >
> > Please add javadocs, explaining this class as well as the fact that 
> > there are currently two classes which extends this. Also, add difference in 
> > behavior of those two classes which necessitates the need for this base 
> > class.

done


> On Jan. 10, 2013, 2:24 a.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationCompositeOperator.java,
> >  line 25
> > 
> >
> > Unused import.

done


> On Jan. 10, 2013, 2:24 a.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationCompositeOperator.java,
> >  line 26
> > 
> >
> > Unused import.

done


> On Jan. 10, 2013, 2:24 a.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationCompositeOperator.java,
> >  line 155
> > 
> >
> > please do return "CorrelationComposite" here instead of "CCO"

done


> On Jan. 10, 2013, 2:24 a.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationLocalSimulativeReduceSinkOperator.java,
> >  line 224
> > 
> >
> > It seems like you are serializing and then immediately deserializing 
> > keys and values here, which I think is required for ReduceSinkOperator 
> > since keys and values are transferred from mapper process to reducer 
> > process. This is redundant in CLSReduceSinkOp since its all running inline 
> > in one operator pipeline in same memory. So, its looks like this could be 
> > avoided. 
> > I guess doing this keeps implementation easier, but if this is true, we 
> > should take this up in follow-up jira as performance improvement.

yes, it was for easier implementation. I will add a comment indicating it will 
be addressed in a follow-up jira.


> On Jan. 10, 2013, 2:24 a.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationLocalSimulativeReduceSinkOperator.java,
> >  line 275
> > 
> >
> > Does this imply CLSRSOperator cannot have more than one child operator 
> > in any case. If so, can you please add comments stating that along with 
> > small description why is that?

I thought that, in the original plan, a ReduceSinkOperator can only have 1 
child. Because a CLSRSOperator replaces a ReduceSinkOperator, it also has a 
single child. Is my understanding correct?


> On Jan. 10, 2013, 2:24 a.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationLocalSimulativeReduceSinkOperator.java,
> >  line 285
> > 
> >
> > Please add comments here about why its overridden for empty 
> > implementation and how startGroup() is taken care of in processOp()

done


> On Jan. 10, 2013, 2:24 a.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationLocalSimulativeReduceSinkOperator.java,
> >  line 290
> > 
> >
> > Please add comments here about why its overridden for empty 
> > implementation and how endGroup() is dealt with in processOp()

done


> On Jan. 10, 2013, 2:24 a.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationLocalSimulativeReduceSinkOperator.java,
> >  line 294
> > 
> >
> > Please add comments here about why its overridden for empty 
> > implementation and how this is taken care of in processOp()

added a comment to explain this method.


- Yin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7126/#review15124
---


On Nov. 19, 20

[jira] [Commented] (HIVE-3004) RegexSerDe should support other column types in addition to STRING

2013-01-10 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13550552#comment-13550552
 ] 

Ashutosh Chauhan commented on HIVE-3004:


Sorry, this fell off of my radar. Please rebase and I will take a look.

> RegexSerDe should support other column types in addition to STRING
> --
>
> Key: HIVE-3004
> URL: https://issues.apache.org/jira/browse/HIVE-3004
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: Carl Steinbach
>Assignee: Shreepadma Venugopalan
> Attachments: HIVE-3004-1.patch, HIVE-3004.2.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-1399) Nested UDAFs cause Hive Internal Error (NullPointerException)

2013-01-10 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13550546#comment-13550546
 ] 

Brock Noland commented on HIVE-1399:


OK great! I'll have an updated patch shortly. Doesn't look like I have 
privileges to assign the jira to myself.

> Nested UDAFs cause Hive Internal Error (NullPointerException)
> -
>
> Key: HIVE-1399
> URL: https://issues.apache.org/jira/browse/HIVE-1399
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.9.0
>Reporter: Mayank Lahiri
>Assignee: Adam Fokken
> Attachments: HIVE-1399.1.patch.txt
>
>
> This query does not make "real-world" sense, and I'm guessing it's not even 
> supported by HQL/SQL, but I'm pretty sure that it shouldn't be causing an 
> internal error with a NullPointerException. "normal" just has one column 
> called "val". I'm running on trunk, svn updated 5 minutes ago, ant clean 
> package.
> SELECT percentile(val, percentile(val, 0.5)) FROM normal;
> FAILED: Hive Internal Error: java.lang.NullPointerException(null)
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc.newInstance(ExprNodeGenericFuncDesc.java:153)
>   at 
> org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.getXpathOrFuncExprNodeDesc(TypeCheckProcFactory.java:587)
>   at 
> org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:708)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:88)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:128)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:102)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:6241)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapGroupByOperator(SemanticAnalyzer.java:2301)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapAggr1MR(SemanticAnalyzer.java:2860)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:5002)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:5524)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:6055)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:126)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:304)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:377)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:138)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:303)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> I've also recreated this error with a GenericUDAF I'm writing, and also with 
> the following:
> SELECT percentile(val, percentile()) FROM normal;   
> SELECT avg(variance(dob_year)) FROM somedata; // this makes no sense, but 
> still a NullPointerException

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-1399) Nested UDAFs cause Hive Internal Error (NullPointerException)

2013-01-10 Thread Adam Fokken (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13550532#comment-13550532
 ] 

Adam Fokken commented on HIVE-1399:
---

Yes. Please do.



> Nested UDAFs cause Hive Internal Error (NullPointerException)
> -
>
> Key: HIVE-1399
> URL: https://issues.apache.org/jira/browse/HIVE-1399
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.9.0
>Reporter: Mayank Lahiri
>Assignee: Adam Fokken
> Attachments: HIVE-1399.1.patch.txt
>
>
> This query does not make "real-world" sense, and I'm guessing it's not even 
> supported by HQL/SQL, but I'm pretty sure that it shouldn't be causing an 
> internal error with a NullPointerException. "normal" just has one column 
> called "val". I'm running on trunk, svn updated 5 minutes ago, ant clean 
> package.
> SELECT percentile(val, percentile(val, 0.5)) FROM normal;
> FAILED: Hive Internal Error: java.lang.NullPointerException(null)
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc.newInstance(ExprNodeGenericFuncDesc.java:153)
>   at 
> org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.getXpathOrFuncExprNodeDesc(TypeCheckProcFactory.java:587)
>   at 
> org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:708)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:88)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:128)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:102)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:6241)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapGroupByOperator(SemanticAnalyzer.java:2301)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapAggr1MR(SemanticAnalyzer.java:2860)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:5002)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:5524)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:6055)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:126)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:304)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:377)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:138)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:303)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> I've also recreated this error with a GenericUDAF I'm writing, and also with 
> the following:
> SELECT percentile(val, percentile()) FROM normal;   
> SELECT avg(variance(dob_year)) FROM somedata; // this makes no sense, but 
> still a NullPointerException

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-1399) Nested UDAFs cause Hive Internal Error (NullPointerException)

2013-01-10 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13550517#comment-13550517
 ] 

Ashutosh Chauhan commented on HIVE-1399:


Sure, please go ahead.

> Nested UDAFs cause Hive Internal Error (NullPointerException)
> -
>
> Key: HIVE-1399
> URL: https://issues.apache.org/jira/browse/HIVE-1399
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.9.0
>Reporter: Mayank Lahiri
>Assignee: Adam Fokken
> Attachments: HIVE-1399.1.patch.txt
>
>
> This query does not make "real-world" sense, and I'm guessing it's not even 
> supported by HQL/SQL, but I'm pretty sure that it shouldn't be causing an 
> internal error with a NullPointerException. "normal" just has one column 
> called "val". I'm running on trunk, svn updated 5 minutes ago, ant clean 
> package.
> SELECT percentile(val, percentile(val, 0.5)) FROM normal;
> FAILED: Hive Internal Error: java.lang.NullPointerException(null)
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc.newInstance(ExprNodeGenericFuncDesc.java:153)
>   at 
> org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.getXpathOrFuncExprNodeDesc(TypeCheckProcFactory.java:587)
>   at 
> org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:708)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:88)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:128)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:102)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:6241)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapGroupByOperator(SemanticAnalyzer.java:2301)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapAggr1MR(SemanticAnalyzer.java:2860)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:5002)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:5524)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:6055)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:126)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:304)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:377)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:138)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:303)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> I've also recreated this error with a GenericUDAF I'm writing, and also with 
> the following:
> SELECT percentile(val, percentile()) FROM normal;   
> SELECT avg(variance(dob_year)) FROM somedata; // this makes no sense, but 
> still a NullPointerException

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Hive-trunk-h0.21 - Build # 1905 - Failure

2013-01-10 Thread Apache Jenkins Server
Changes for Build #1905



1 tests failed.
REGRESSION:  
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_stats_aggregator_error_1

Error Message:
Forked Java VM exited abnormally. Please note the time in the report does not 
reflect the time until the VM exit.

Stack Trace:
junit.framework.AssertionFailedError: Forked Java VM exited abnormally. Please 
note the time in the report does not reflect the time until the VM exit.
at 
net.sf.antcontrib.logic.ForTask.doSequentialIteration(ForTask.java:259)
at net.sf.antcontrib.logic.ForTask.doToken(ForTask.java:268)
at net.sf.antcontrib.logic.ForTask.doTheTasks(ForTask.java:324)
at net.sf.antcontrib.logic.ForTask.execute(ForTask.java:244)




The Apache Jenkins build system has built Hive-trunk-h0.21 (build #1905)

Status: Failure

Check console output at https://builds.apache.org/job/Hive-trunk-h0.21/1905/ to 
view the results.

[jira] [Commented] (HIVE-1399) Nested UDAFs cause Hive Internal Error (NullPointerException)

2013-01-10 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13550491#comment-13550491
 ] 

Brock Noland commented on HIVE-1399:


Hi Adam,

I haven't seen an update in a while. Do you mind if I take this up?

Brock

> Nested UDAFs cause Hive Internal Error (NullPointerException)
> -
>
> Key: HIVE-1399
> URL: https://issues.apache.org/jira/browse/HIVE-1399
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.9.0
>Reporter: Mayank Lahiri
>Assignee: Adam Fokken
> Attachments: HIVE-1399.1.patch.txt
>
>
> This query does not make "real-world" sense, and I'm guessing it's not even 
> supported by HQL/SQL, but I'm pretty sure that it shouldn't be causing an 
> internal error with a NullPointerException. "normal" just has one column 
> called "val". I'm running on trunk, svn updated 5 minutes ago, ant clean 
> package.
> SELECT percentile(val, percentile(val, 0.5)) FROM normal;
> FAILED: Hive Internal Error: java.lang.NullPointerException(null)
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc.newInstance(ExprNodeGenericFuncDesc.java:153)
>   at 
> org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.getXpathOrFuncExprNodeDesc(TypeCheckProcFactory.java:587)
>   at 
> org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:708)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:88)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:128)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:102)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:6241)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapGroupByOperator(SemanticAnalyzer.java:2301)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapAggr1MR(SemanticAnalyzer.java:2860)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:5002)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:5524)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:6055)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:126)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:304)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:377)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:138)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:303)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> I've also recreated this error with a GenericUDAF I'm writing, and also with 
> the following:
> SELECT percentile(val, percentile()) FROM normal;   
> SELECT avg(variance(dob_year)) FROM somedata; // this makes no sense, but 
> still a NullPointerException

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3882) config for enabling HIVE-2723

2013-01-10 Thread Pamela Vagata (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pamela Vagata updated HIVE-3882:


Attachment: HIVE-3882.patch.1.txt

https://reviews.facebook.net/D7839

> config for enabling HIVE-2723
> -
>
> Key: HIVE-3882
> URL: https://issues.apache.org/jira/browse/HIVE-3882
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Pamela Vagata
> Attachments: HIVE-3882.patch.1.txt
>
>
> Since it a backward incompatible change, it might be a good idea to make it
> configurable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HIVE-3882) config for enabling HIVE-2723

2013-01-10 Thread Pamela Vagata (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pamela Vagata reassigned HIVE-3882:
---

Assignee: Pamela Vagata

> config for enabling HIVE-2723
> -
>
> Key: HIVE-3882
> URL: https://issues.apache.org/jira/browse/HIVE-3882
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Pamela Vagata
> Attachments: HIVE-3882.patch.1.txt
>
>
> Since it a backward incompatible change, it might be a good idea to make it
> configurable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-3885) CLI command "SHOW PARTITIONS" could be extended to provide LOCATION information

2013-01-10 Thread Sanjay Subramanian (JIRA)
Sanjay Subramanian created HIVE-3885:


 Summary: CLI command "SHOW PARTITIONS" could be extended to 
provide LOCATION information
 Key: HIVE-3885
 URL: https://issues.apache.org/jira/browse/HIVE-3885
 Project: Hive
  Issue Type: New Feature
Reporter: Sanjay Subramanian
Priority: Minor


SHOW PARTITIONS does not provide information on the HDFS location of the data. 
The workaround is to query the metadata DB. The following command will give you 
the HDFS file locations as stored in the metadata tables.
 
echo "select t.TBL_NAME, p.PART_NAME, s.LOCATION from PARTITIONS p, SDS s, TBLS 
t where t.TBL_ID=p.TBL_ID and p.SD_ID=s.SD_ID " |mysql -u 
–p   |grep |less

If this could be encapsulated in a CLI command SHOW LOCATIONS that displays
PARTITION_NAMELOCATION

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3874) Create a new Optimized Row Columnar file format for Hive

2013-01-10 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13550379#comment-13550379
 ] 

He Yongqiang commented on HIVE-3874:


I want to list a few thoughts why i think the orc solution is a much more 
appealing one.

1. For a BIG data warehouse that stores more than 90% of existing data in 
rcfile (like FB's >100PB warehouse), data conversion from one format to another 
is something that definitely should be avoided. It is possible to convert some 
tables if there is a big space saving advantage. But managing two distinct 
formats which do not have any compatibility, inter-operability, or even in two 
different code repositories is another big headache that would avoid at the 
first place.
2. Developing the new ORC format in the hive/hcatalog codebase will make hive 
development/operations much easier.
3. Letting new ORC format have some backward compatibility with RCFile will 
save a lot of trouble.




> Create a new Optimized Row Columnar file format for Hive
> 
>
> Key: HIVE-3874
> URL: https://issues.apache.org/jira/browse/HIVE-3874
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: OrcFileIntro.pptx
>
>
> There are several limitations of the current RC File format that I'd like to 
> address by creating a new format:
> * each column value is stored as a binary blob, which means:
> ** the entire column value must be read, decompressed, and deserialized
> ** the file format can't use smarter type-specific compression
> ** push down filters can't be evaluated
> * the start of each row group needs to be found by scanning
> * user metadata can only be added to the file when the file is created
> * the file doesn't store the number of rows per a file or row group
> * there is no mechanism for seeking to a particular row number, which is 
> required for external indexes.
> * there is no mechanism for storing light weight indexes within the file to 
> enable push-down filters to skip entire row groups.
> * the type of the rows aren't stored in the file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3874) Create a new Optimized Row Columnar file format for Hive

2013-01-10 Thread Russell Jurney (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13550375#comment-13550375
 ] 

Russell Jurney commented on HIVE-3874:
--

He had told us this work must go in contrib. See HIVE-3585

> Create a new Optimized Row Columnar file format for Hive
> 
>
> Key: HIVE-3874
> URL: https://issues.apache.org/jira/browse/HIVE-3874
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: OrcFileIntro.pptx
>
>
> There are several limitations of the current RC File format that I'd like to 
> address by creating a new format:
> * each column value is stored as a binary blob, which means:
> ** the entire column value must be read, decompressed, and deserialized
> ** the file format can't use smarter type-specific compression
> ** push down filters can't be evaluated
> * the start of each row group needs to be found by scanning
> * user metadata can only be added to the file when the file is created
> * the file doesn't store the number of rows per a file or row group
> * there is no mechanism for seeking to a particular row number, which is 
> required for external indexes.
> * there is no mechanism for storing light weight indexes within the file to 
> enable push-down filters to skip entire row groups.
> * the type of the rows aren't stored in the file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2935) Implement HiveServer2

2013-01-10 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13550366#comment-13550366
 ] 

Thejas M Nair commented on HIVE-2935:
-

bq. Does the BigDecimal support requires the HIVE-2693 patch ? 
Yes, but I think there has been good progress with the review there and 
hopefully gets committed soon. If not, we can move that to a different patch. 

bq. The last patch I updated did have thirft 0.9 generated bindings. Did you 
use that as the base for this work ?
The patch I uploaded was based on  HIVE-2935.2.notest.patch.txt. The thrift 
changes in your patch should work. 


> Implement HiveServer2
> -
>
> Key: HIVE-2935
> URL: https://issues.apache.org/jira/browse/HIVE-2935
> Project: Hive
>  Issue Type: New Feature
>  Components: Server Infrastructure
>Reporter: Carl Steinbach
>Assignee: Carl Steinbach
>  Labels: HiveServer2
> Attachments: beelinepositive.tar.gz, HIVE-2935.1.notest.patch.txt, 
> HIVE-2935.2.notest.patch.txt, HIVE-2935.2.nothrift.patch.txt, 
> HIVE-2935.3.patch.gz, HS2-changed-files-only.patch, 
> HS2-with-thrift-patch-rebased.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3585) Integrate Trevni as another columnar oriented file format

2013-01-10 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13550347#comment-13550347
 ] 

He Yongqiang commented on HIVE-3585:


bq. This patch is going to share 90% of its small code with the existing 
AvroSerde that was never shunted into contrib. 

Then why it is so hard to make it part of existing AvroSerde?

bq. I'm not seeing any technical reasons to block progress. 
Technically, there is no issue. Technically I am pretty sure this can be well 
done.

bq. Is anyone planning on exercising a -1?

I have listed two options that i insist on. one is to develop it as part of 
existing avroserde, the other is to put it in contrib or a 3rd party lib (maybe 
github?).



> Integrate Trevni as another columnar oriented file format
> -
>
> Key: HIVE-3585
> URL: https://issues.apache.org/jira/browse/HIVE-3585
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 0.10.0
>Reporter: alex gemini
>Assignee: Mark Wagner
>Priority: Minor
>
> add new avro module trevni as another columnar format.New columnar format 
> need a columnar SerDe,seems fastutil is a good choice.the shark project use 
> fastutil library as columnar serde library but it seems too large (almost 
> 15m) for just a few primitive array collection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2935) Implement HiveServer2

2013-01-10 Thread Prasad Mujumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13549971#comment-13549971
 ] 

Prasad Mujumdar commented on HIVE-2935:
---

[~thejas] Thanks for the additional pieces! I will take a look at changes and 
let you know if I have any comments/suggestions.
Does the BigDecimal support requires the HIVE-2693 patch ? If that's the case 
then it's perhaps better to separate that out into a different patch. The last 
patch I updated did have thirft 0.9 generated bindings. Did you use that as the 
base for this work ?

[~namit] and [~ashutoshc] would it be possible for you to take a look at 
HIVE-3785 ?

> Implement HiveServer2
> -
>
> Key: HIVE-2935
> URL: https://issues.apache.org/jira/browse/HIVE-2935
> Project: Hive
>  Issue Type: New Feature
>  Components: Server Infrastructure
>Reporter: Carl Steinbach
>Assignee: Carl Steinbach
>  Labels: HiveServer2
> Attachments: beelinepositive.tar.gz, HIVE-2935.1.notest.patch.txt, 
> HIVE-2935.2.notest.patch.txt, HIVE-2935.2.nothrift.patch.txt, 
> HIVE-2935.3.patch.gz, HS2-changed-files-only.patch, 
> HS2-with-thrift-patch-rebased.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2439) Upgrade antlr version to 3.4

2013-01-10 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-2439:
---

Fix Version/s: (was: 0.11.0)
   (was: 0.9.1)
   (was: 0.10.0)

> Upgrade antlr version to 3.4
> 
>
> Key: HIVE-2439
> URL: https://issues.apache.org/jira/browse/HIVE-2439
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 0.8.0
>Reporter: Ashutosh Chauhan
>Assignee: Thiruvel Thirumoolan
> Attachments: HIVE-2439_branch9_2.patch, HIVE-2439_branch9_3.patch, 
> HIVE-2439_branch9.patch, hive-2439_incomplete.patch, HIVE-2439_trunk.patch
>
>
> Upgrade antlr version to 3.4

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3812) TestCase TestJdbcDriver fails with IBM Java 6

2013-01-10 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-3812:
---

Fix Version/s: (was: 0.8.1)

> TestCase TestJdbcDriver fails with IBM Java 6
> -
>
> Key: HIVE-3812
> URL: https://issues.apache.org/jira/browse/HIVE-3812
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC, Tests
>Affects Versions: 0.8.0, 0.8.1, 0.9.0, 0.10.0
> Environment: Apache Ant 1.7.1
> IBM JDK 6
>Reporter: Renata Ghisloti Duarte de Souza
>Priority: Minor
> Attachments: HIVE-3812.1_0.8.1.patch.txt, HIVE-3812.1_trunk.patch.txt
>
>
> When running testcase TestJdbcDriver with IBM Java 6, it fails with the 
> following error:
>  type="junit.framework.ComparisonFailure">junit.framework.ComparisonFailure: 
> expected:[[{}, 1], [{[c=d, a=b]}, 2]] but was:[[{}, 1], [{[a=b, c=d]}, 2]];
>   at junit.framework.Assert.assertEquals(Assert.java:85)
>   at junit.framework.Assert.assertEquals(Assert.java:91)
>   at 
> org.apache.hadoop.hive.jdbc.TestJdbcDriver.testDataTypes(TestJdbcDriver.java:380)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3768) Document JDBC client configuration for secure clusters

2013-01-10 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-3768:
---

Fix Version/s: (was: 0.10.0)

> Document JDBC client configuration for secure clusters
> --
>
> Key: HIVE-3768
> URL: https://issues.apache.org/jira/browse/HIVE-3768
> Project: Hive
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 0.9.0
>Reporter: Lefty Leverenz
>Assignee: Lefty Leverenz
> Attachments: HIVE-3768.1.patch
>
>
> Document the JDBC client configuration required for starting Hive on a secure 
> cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3812) TestCase TestJdbcDriver fails with IBM Java 6

2013-01-10 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-3812:
---

Fix Version/s: (was: 0.10.0)

> TestCase TestJdbcDriver fails with IBM Java 6
> -
>
> Key: HIVE-3812
> URL: https://issues.apache.org/jira/browse/HIVE-3812
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC, Tests
>Affects Versions: 0.8.0, 0.8.1, 0.9.0, 0.10.0
> Environment: Apache Ant 1.7.1
> IBM JDK 6
>Reporter: Renata Ghisloti Duarte de Souza
>Priority: Minor
> Fix For: 0.8.1
>
> Attachments: HIVE-3812.1_0.8.1.patch.txt, HIVE-3812.1_trunk.patch.txt
>
>
> When running testcase TestJdbcDriver with IBM Java 6, it fails with the 
> following error:
>  type="junit.framework.ComparisonFailure">junit.framework.ComparisonFailure: 
> expected:[[{}, 1], [{[c=d, a=b]}, 2]] but was:[[{}, 1], [{[a=b, c=d]}, 2]];
>   at junit.framework.Assert.assertEquals(Assert.java:85)
>   at junit.framework.Assert.assertEquals(Assert.java:91)
>   at 
> org.apache.hadoop.hive.jdbc.TestJdbcDriver.testDataTypes(TestJdbcDriver.java:380)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2935) Implement HiveServer2

2013-01-10 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-2935:


Attachment: HIVE-2935.3.patch.gz

HIVE-2935.3.patch.gz - Attaching patch with bug fixes and updates needed for 
recent changes in hive 0.10/trunk.

I am looking for suggestions on how to get this reviewed and committed. As the 
changes to core are being reviewed as part of HIVE-3785, I think it would be 
better to continue with earlier patch, and add the fixes in this patch 
separately. 

The patch also has support for decimal type (HIVE-2693), which is yet to be 
committed. 
The TestBeelineDriver tests work fine, if applied to branch 0.10. In trunk, 
changes to describe table output (HIVE-3140) is resulting in two headers 
getting printed and problems in formatting, and is causing TestBeelineDriver 
tests to fail. I think the change in HIVE-3140 needs to be revisited.

Highlight of the changes in this patch -
- Adding support for decimal data type in HS2 and JDBC driver. Current 
implementation uses String to transport BigDecimal.
- Incorporate basic unit test suite for hive server 2. A handpicked sample of 
TestCliDriver tests are run using jdbc + HS2 under TestBeelineDriver tests by 
default. These tests don't have issues described earlier and are expected to 
pass. It also limits the total test runtime.
- Fix problem with "current database" retained between sessions (in Hive class)
- hiveserver2 with concurrency results in incorrect stats - disabling stats 
test, masking stats
- regenerated thrift code for HiveServer2 + thrift 0.9
- Re-enabling type verification for hive variables/settings 
- Fix OOM on the HiveServer - When running multiple execute operations within a 
statement, only the last one was being cleaned up at the server leaving leaving 
orphaned objects on the server.
- Correct handling of binary column in server and driver.
- ANSI standard dicates that null column should be printed as NULL - regen 
testbeeline benchmarks. Make it compatible with hive cli.
- Enable doAs() functionality for HS2
- Fixed string representation of complex type to bring jdbc driver in 
compliance with hive client.
- add support for setting properties on cmdline (using -hiveconf) for 
hiveserver2 
- Making handleToSession map in SessionManager ConcurrentHashMap, as it is used 
concurrently by multiple threads.

> Implement HiveServer2
> -
>
> Key: HIVE-2935
> URL: https://issues.apache.org/jira/browse/HIVE-2935
> Project: Hive
>  Issue Type: New Feature
>  Components: Server Infrastructure
>Reporter: Carl Steinbach
>Assignee: Carl Steinbach
>  Labels: HiveServer2
> Attachments: beelinepositive.tar.gz, HIVE-2935.1.notest.patch.txt, 
> HIVE-2935.2.notest.patch.txt, HIVE-2935.2.nothrift.patch.txt, 
> HIVE-2935.3.patch.gz, HS2-changed-files-only.patch, 
> HS2-with-thrift-patch-rebased.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3874) Create a new Optimized Row Columnar file format for Hive

2013-01-10 Thread Doug Cutting (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13549916#comment-13549916
 ] 

Doug Cutting commented on HIVE-3874:


A way to proceed with improvements to Trevni might be, once HIVE-3585 is 
committed, propose a patch to Trevni together with a Hive benchmark that 
illustrates its advantage.  Then we could quantitatively demonstrate the 
advantage of each proposed improvement.  With HIVE-3585 we should be able to 
quantitatively demonstrate advantages of Trevni as-is over RCFile and 
SequenceFile.

> Create a new Optimized Row Columnar file format for Hive
> 
>
> Key: HIVE-3874
> URL: https://issues.apache.org/jira/browse/HIVE-3874
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: OrcFileIntro.pptx
>
>
> There are several limitations of the current RC File format that I'd like to 
> address by creating a new format:
> * each column value is stored as a binary blob, which means:
> ** the entire column value must be read, decompressed, and deserialized
> ** the file format can't use smarter type-specific compression
> ** push down filters can't be evaluated
> * the start of each row group needs to be found by scanning
> * user metadata can only be added to the file when the file is created
> * the file doesn't store the number of rows per a file or row group
> * there is no mechanism for seeking to a particular row number, which is 
> required for external indexes.
> * there is no mechanism for storing light weight indexes within the file to 
> enable push-down filters to skip entire row groups.
> * the type of the rows aren't stored in the file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HIVE-3884) Better align columns in DESCRIBE table_name output to make more human-readable

2013-01-10 Thread Gang Tim Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu reassigned HIVE-3884:
--

Assignee: Gang Tim Liu

> Better align columns in DESCRIBE table_name output to make more human-readable
> --
>
> Key: HIVE-3884
> URL: https://issues.apache.org/jira/browse/HIVE-3884
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>Affects Versions: 0.9.0
>Reporter: Dilip Joseph
>Assignee: Gang Tim Liu
>Priority: Minor
> Attachments: describe_test_table.png
>
>
> If a table contains very long comments or very long column names, the output 
> of DESCRIBE table_name is not aligned nicely.  The attached screenshot shows 
> the following two problems:
> 1. Rows with long column names do not align well with other columns.
> 2. Rows with long comments wrap to the next line, and make it hard to read 
> the output.  The wrapping behavior depends on the width of the user's 
> terminal width.
> It will be nice to have a DESCRIBE PRETTY table_name command that will 
> produce nicely formatted output that avoids the two problems mentioned above. 
>  It is better to introduce a new DESCRIBE PRETTY command rather than change 
> the behavior of the existing DESCRIBE or DESCRIBE FORMATTED commands, so that 
> we avoid breaking any scripts that automatically parse the output.
> Since the pretty formatting depends on the current terminal width, we need a 
> new hive conf parameter to tell the CLI to auto-detect the current terminal 
> width or to use a fixed width (needed for unit tests).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: [VOTE] Apache Hive 0.10.0 Release Candidate 0

2013-01-10 Thread Carl Steinbach
Congratulations to all Hive contributors on another job well done, and to
Ashutosh in particular for his work as release manager!

On Thu, Jan 10, 2013 at 7:58 AM, Ashutosh Chauhan wrote:

> With 3 binding +1s and two non-bindings +1s, vote passed.
>
> Thanks Alan, Namit, Alex and Carl for your time on testing and voting on
> RC. Much Appreciated.
>
> I will push out the artifacts. I will send a note on list announcing the
> release once these artifacts are available.
>
> Thanks,
> Ashutosh
>
>
> On Wed, Jan 9, 2013 at 11:02 AM, Carl Steinbach  wrote:
>
> > +1 (binding)
> >
> > - ran tests
> > - worked through some tutorial exercises
> >
> >
> > On Tue, Jan 8, 2013 at 10:56 PM, Alexander Alten-Lorenz <
> > wget.n...@gmail.com
> > > wrote:
> >
> > > +1 (non binding)
> > > - ran build, tests
> > > - sample queries ran too
> > >
> > > Thanks for driving this,
> > >  Alex
> > >
> > > On Jan 9, 2013, at 6:19 AM, Namit Jain  wrote:
> > >
> > > > +1
> > > >
> > > > Build from src.
> > > > Ran some sanity tests both from bin and compiled src - they looked
> good
> > > >
> > > >
> > > >
> > > > On 12/27/12 3:08 AM, "Ashutosh Chauhan" 
> > > wrote:
> > > >
> > > >> +1
> > > >> Built from sources. Ran unit tests. All looked good.
> > > >>
> > > >> Thanks,
> > > >> Ashutosh
> > > >>
> > > >>
> > > >> On Fri, Dec 21, 2012 at 2:25 PM, Alan Gates 
> > > wrote:
> > > >>
> > > >>> +1 (non-binding)
> > > >>> Checked the check sums and key signatures.  Installed it and ran a
> > few
> > > >>> queries.  All looked good.
> > > >>>
> > > >>> As a note Hive should be offering a src only release and a
> > convenience
> > > >>> binary rather than two binaries, one with the source and one
> without.
> > > >>> See
> > > >>> the thread on general@incubator discussing this:
> > > >>>
> > > >>>
> > >
> >
> http://mail-archives.apache.org/mod_mbox/incubator-general/201203.mbox/%3
> > > >>> CCAOFYJNY%3DEjVHrWVvAedR3OKwCv-BkTaCbEu0ufp7OZR_gpCTiA%
> > > 40mail.gmail.com%3
> > > >>> E I think this can be solved later and need not block this release.
> > > >>>
> > > >>> Alan.
> > > >>>
> > > >>> On Dec 18, 2012, at 10:23 PM, Ashutosh Chauhan wrote:
> > > >>>
> > >  Apache Hive 0.10.0 Release Candidate 0 is available here:
> > >  http://people.apache.org/~hashutosh/hive-0.10.0-rc0/
> > > 
> > >  Maven artifacts are available here:
> > > 
> > > >>>
> > > >>>
> > >
> >
> https://repository.apache.org/content/repositories/orgapachehive-049/org/
> > > >>> apache/hive/
> > > 
> > > 
> > >  Release notes are available at:
> > > 
> > > >>>
> > > >>>
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12320745&s
> > > >>>
> > >
> tyleName=Text&projectId=12310843&Create=Create&atl_token=A5KQ-2QAV-T4JA-F
> > > >>> DED%7C70f39c6dd3cf337eaa0e3a0359687cf608903879%7Clin
> > > 
> > > 
> > >  Voting will conclude in 72 hours.
> > > 
> > >  Hive PMC Members: Please test and vote.
> > > 
> > >  Thanks,
> > > 
> > >  Ashutosh
> > > >>>
> > > >>>
> > > >
> > >
> > > --
> > > Alexander Alten-Lorenz
> > > http://mapredit.blogspot.com
> > > German Hadoop LinkedIn Group: http://goo.gl/N8pCF
> > >
> > >
> >
>


[jira] [Updated] (HIVE-3884) Better align columns in DESCRIBE table_name output to make more human-readable

2013-01-10 Thread Dilip Joseph (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dilip Joseph updated HIVE-3884:
---

Attachment: describe_test_table.png

Output of DESCRIBE test_table for the following table:

CREATE TABLE test_table(
col1 INT COMMENT 'col1 one line comment',
col2 STRING COMMENT 'col2 very long comment that is greater than 80 chars 
and is likely to spill into multiple lines',
col3 STRING COMMENT 'col3 very long multi-line comment where each line is 
very long by itself and is likely to spill
into multiple lines.  Lorem ipsum dolor sit amet, consectetur adipiscing elit. 
Proin in dolor nisl, sodales
adipiscing tortor. Integer venenatis',
col4 INT COMMENT 'col4 one line comment',
col5_VeryLongColumnNameThatMessesUpAlignment INT COMMENT 'col5 one line 
comment',
col6 STRING COMMENT 'col6 one line comment'
);

> Better align columns in DESCRIBE table_name output to make more human-readable
> --
>
> Key: HIVE-3884
> URL: https://issues.apache.org/jira/browse/HIVE-3884
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>Affects Versions: 0.9.0
>Reporter: Dilip Joseph
>Priority: Minor
> Attachments: describe_test_table.png
>
>
> If a table contains very long comments or very long column names, the output 
> of DESCRIBE table_name is not aligned nicely.  The attached screenshot shows 
> the following two problems:
> 1. Rows with long column names do not align well with other columns.
> 2. Rows with long comments wrap to the next line, and make it hard to read 
> the output.  The wrapping behavior depends on the width of the user's 
> terminal width.
> It will be nice to have a DESCRIBE PRETTY table_name command that will 
> produce nicely formatted output that avoids the two problems mentioned above. 
>  It is better to introduce a new DESCRIBE PRETTY command rather than change 
> the behavior of the existing DESCRIBE or DESCRIBE FORMATTED commands, so that 
> we avoid breaking any scripts that automatically parse the output.
> Since the pretty formatting depends on the current terminal width, we need a 
> new hive conf parameter to tell the CLI to auto-detect the current terminal 
> width or to use a fixed width (needed for unit tests).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-3884) Better align columns in DESCRIBE table_name output to make more human-readable

2013-01-10 Thread Dilip Joseph (JIRA)
Dilip Joseph created HIVE-3884:
--

 Summary: Better align columns in DESCRIBE table_name output to 
make more human-readable
 Key: HIVE-3884
 URL: https://issues.apache.org/jira/browse/HIVE-3884
 Project: Hive
  Issue Type: Improvement
  Components: CLI
Affects Versions: 0.9.0
Reporter: Dilip Joseph
Priority: Minor


If a table contains very long comments or very long column names, the output of 
DESCRIBE table_name is not aligned nicely.  The attached screenshot shows the 
following two problems:

1. Rows with long column names do not align well with other columns.
2. Rows with long comments wrap to the next line, and make it hard to read the 
output.  The wrapping behavior depends on the width of the user's terminal 
width.

It will be nice to have a DESCRIBE PRETTY table_name command that will produce 
nicely formatted output that avoids the two problems mentioned above.  It is 
better to introduce a new DESCRIBE PRETTY command rather than change the 
behavior of the existing DESCRIBE or DESCRIBE FORMATTED commands, so that we 
avoid breaking any scripts that automatically parse the output.

Since the pretty formatting depends on the current terminal width, we need a 
new hive conf parameter to tell the CLI to auto-detect the current terminal 
width or to use a fixed width (needed for unit tests).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3874) Create a new Optimized Row Columnar file format for Hive

2013-01-10 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13549870#comment-13549870
 ] 

He Yongqiang commented on HIVE-3874:


bq. It would be possible to extend the RCFile reader to recognize an ORC file 
and to have it delegate to the ORC File reader.
it will be great to have this support. In this case, what's the fileformat for 
the partition/table, rcfile, or orcfile?

When we did the conversion for old data from sequencefile to rcfile long time 
ago, it is a big headache handle errors like "unrecognized fileformat or 
corruption" because there is no interoperability between these two files. The 
most errors we saw are because the table/partition format does not match the 
actual data format.

two examples:
1. old partition's data is rcfile, new partition's data is in orc format. 
2. in one partition, some files are rcfile, and some files are in orc format.



> Create a new Optimized Row Columnar file format for Hive
> 
>
> Key: HIVE-3874
> URL: https://issues.apache.org/jira/browse/HIVE-3874
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: OrcFileIntro.pptx
>
>
> There are several limitations of the current RC File format that I'd like to 
> address by creating a new format:
> * each column value is stored as a binary blob, which means:
> ** the entire column value must be read, decompressed, and deserialized
> ** the file format can't use smarter type-specific compression
> ** push down filters can't be evaluated
> * the start of each row group needs to be found by scanning
> * user metadata can only be added to the file when the file is created
> * the file doesn't store the number of rows per a file or row group
> * there is no mechanism for seeking to a particular row number, which is 
> required for external indexes.
> * there is no mechanism for storing light weight indexes within the file to 
> enable push-down filters to skip entire row groups.
> * the type of the rows aren't stored in the file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3874) Create a new Optimized Row Columnar file format for Hive

2013-01-10 Thread Doug Cutting (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13549867#comment-13549867
 ] 

Doug Cutting commented on HIVE-3874:


Owen, did you consider proposing improvements to Trevni instead?

Addressing your four points of distinction with Trevni:

 - How is Trevni's type model incompatible with Hive?  Is the irreparable?
 - Might dictionaries be somehow added to Trevni?
 - What sort of indexes are required in addition to those that Trevni supports, 
where the initial value of every block may be stored before all the blocks, 
permitting random access by value to the blocks?  If something different is 
required, might that be added to Trevni?
 - Trevni uses relatively small compression blocks (~64k) that may be skipped.  
How would block mode substantially improve this?  If it would, might this 
change be made to Trevni?

Thanks!

> Create a new Optimized Row Columnar file format for Hive
> 
>
> Key: HIVE-3874
> URL: https://issues.apache.org/jira/browse/HIVE-3874
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: OrcFileIntro.pptx
>
>
> There are several limitations of the current RC File format that I'd like to 
> address by creating a new format:
> * each column value is stored as a binary blob, which means:
> ** the entire column value must be read, decompressed, and deserialized
> ** the file format can't use smarter type-specific compression
> ** push down filters can't be evaluated
> * the start of each row group needs to be found by scanning
> * user metadata can only be added to the file when the file is created
> * the file doesn't store the number of rows per a file or row group
> * there is no mechanism for seeking to a particular row number, which is 
> required for external indexes.
> * there is no mechanism for storing light weight indexes within the file to 
> enable push-down filters to skip entire row groups.
> * the type of the rows aren't stored in the file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3874) Create a new Optimized Row Columnar file format for Hive

2013-01-10 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13549839#comment-13549839
 ] 

Owen O'Malley commented on HIVE-3874:
-

Namit, for pure hive users there aren't any advantages of trevni over ORC.

> Create a new Optimized Row Columnar file format for Hive
> 
>
> Key: HIVE-3874
> URL: https://issues.apache.org/jira/browse/HIVE-3874
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: OrcFileIntro.pptx
>
>
> There are several limitations of the current RC File format that I'd like to 
> address by creating a new format:
> * each column value is stored as a binary blob, which means:
> ** the entire column value must be read, decompressed, and deserialized
> ** the file format can't use smarter type-specific compression
> ** push down filters can't be evaluated
> * the start of each row group needs to be found by scanning
> * user metadata can only be added to the file when the file is created
> * the file doesn't store the number of rows per a file or row group
> * there is no mechanism for seeking to a particular row number, which is 
> required for external indexes.
> * there is no mechanism for storing light weight indexes within the file to 
> enable push-down filters to skip entire row groups.
> * the type of the rows aren't stored in the file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3403) user should not specify mapjoin to perform sort-merge bucketed join

2013-01-10 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3403:
-

Attachment: hive.3403.10.patch

> user should not specify mapjoin to perform sort-merge bucketed join
> ---
>
> Key: HIVE-3403
> URL: https://issues.apache.org/jira/browse/HIVE-3403
> Project: Hive
>  Issue Type: Bug
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3403.10.patch, hive.3403.1.patch, 
> hive.3403.2.patch, hive.3403.3.patch, hive.3403.4.patch, hive.3403.5.patch, 
> hive.3403.6.patch, hive.3403.7.patch, hive.3403.8.patch, hive.3403.9.patch
>
>
> Currently, in order to perform a sort merge bucketed join, the user needs
> to set hive.optimize.bucketmapjoin.sortedmerge to true, and also specify the 
> mapjoin hint.
> The user should not specify any hints.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3874) Create a new Optimized Row Columnar file format for Hive

2013-01-10 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13549804#comment-13549804
 ] 

Namit Jain commented on HIVE-3874:
--

What I meant was, for a pure hive user (who does not data inherit data from 
anywhere else), is there any advantage of trevni over ORC ?

> Create a new Optimized Row Columnar file format for Hive
> 
>
> Key: HIVE-3874
> URL: https://issues.apache.org/jira/browse/HIVE-3874
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: OrcFileIntro.pptx
>
>
> There are several limitations of the current RC File format that I'd like to 
> address by creating a new format:
> * each column value is stored as a binary blob, which means:
> ** the entire column value must be read, decompressed, and deserialized
> ** the file format can't use smarter type-specific compression
> ** push down filters can't be evaluated
> * the start of each row group needs to be found by scanning
> * user metadata can only be added to the file when the file is created
> * the file doesn't store the number of rows per a file or row group
> * there is no mechanism for seeking to a particular row number, which is 
> required for external indexes.
> * there is no mechanism for storing light weight indexes within the file to 
> enable push-down filters to skip entire row groups.
> * the type of the rows aren't stored in the file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3874) Create a new Optimized Row Columnar file format for Hive

2013-01-10 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13549784#comment-13549784
 ] 

Owen O'Malley commented on HIVE-3874:
-

Namit, I obviously did consider Trevni, but it didn't support some of the 
features that I wanted:
* using the hive type model
* more advanced encodings like dictionaries
* the ability to support push down predicates for skipping row groups
* running compression in block mode rather than streaming so that the reader 
can skip entire compression blocks

> Create a new Optimized Row Columnar file format for Hive
> 
>
> Key: HIVE-3874
> URL: https://issues.apache.org/jira/browse/HIVE-3874
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: OrcFileIntro.pptx
>
>
> There are several limitations of the current RC File format that I'd like to 
> address by creating a new format:
> * each column value is stored as a binary blob, which means:
> ** the entire column value must be read, decompressed, and deserialized
> ** the file format can't use smarter type-specific compression
> ** push down filters can't be evaluated
> * the start of each row group needs to be found by scanning
> * user metadata can only be added to the file when the file is created
> * the file doesn't store the number of rows per a file or row group
> * there is no mechanism for seeking to a particular row number, which is 
> required for external indexes.
> * there is no mechanism for storing light weight indexes within the file to 
> enable push-down filters to skip entire row groups.
> * the type of the rows aren't stored in the file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3874) Create a new Optimized Row Columnar file format for Hive

2013-01-10 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13549774#comment-13549774
 ] 

Owen O'Malley commented on HIVE-3874:
-

He Yongqiang, the APIs to the two formats are significantly different. It would 
be possible to extend the RCFile reader to recognize an ORC file and to have it 
delegate to the ORC File reader.

The other direction (having the ORC file reader parse an RCFile) isn't 
possible, because ORC provides operations that would be very expensive or 
impossible to implement in RCFile.

One concern with making the RCFile reader delegate to the ORC file reader is 
that RCFile returns binary values that are interpreted by the serde while in 
ORC deserialization happens in the reader. Therefore, either the adaptor would 
need to re-serialize the data or would require changes in the serde as well.

> Create a new Optimized Row Columnar file format for Hive
> 
>
> Key: HIVE-3874
> URL: https://issues.apache.org/jira/browse/HIVE-3874
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: OrcFileIntro.pptx
>
>
> There are several limitations of the current RC File format that I'd like to 
> address by creating a new format:
> * each column value is stored as a binary blob, which means:
> ** the entire column value must be read, decompressed, and deserialized
> ** the file format can't use smarter type-specific compression
> ** push down filters can't be evaluated
> * the start of each row group needs to be found by scanning
> * user metadata can only be added to the file when the file is created
> * the file doesn't store the number of rows per a file or row group
> * there is no mechanism for seeking to a particular row number, which is 
> required for external indexes.
> * there is no mechanism for storing light weight indexes within the file to 
> enable push-down filters to skip entire row groups.
> * the type of the rows aren't stored in the file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: unicode character as delimiter

2013-01-10 Thread Ho Kenneth - kennho
Thanks for the quick response.

I try '\376', but still not working  :(



On 1/10/13 6:23 AM, "Dean Wampler" 
wrote:

>You have to use the octal representation, e.g., ^A is \001.
>
>On Wed, Jan 9, 2013 at 8:32 PM, Ho Kenneth - kennho
>wrote:
>
>> Hi all,
>>
>> I have an input file that has a unicode character as a delimiter, which
>>is
>> þ  (thorn)
>>
>> For example:
>>
>> col1þcol2þcol3
>>
>>   Þ has a value of UTF-8(hex) 0xC3 0xBE (c3be)
>>
>> And I have tried the following but no luck:
>> create table test(col1 string, col2 string, col3 string) row format
>> delimited fields terminated by '\c3be';
>>
>> I'd appreciate your help! Thanks in advance.
>>
>> --ken
>>
>>
>>
>> 
>>*
>>**
>> The information contained in this communication is confidential, is
>> intended only for the use of the recipient named above, and may be
>>legally
>> privileged.
>>
>> If the reader of this message is not the intended recipient, you are
>> hereby notified that any dissemination, distribution or copying of this
>> communication is strictly prohibited.
>>
>> If you have received this communication in error, please resend this
>> communication to the sender and delete the original message or any copy
>> of it from your computer system.
>>
>> Thank You.
>>
>> 
>>*
>>***
>>
>
>
>
>-- 
>*Dean Wampler, Ph.D.*
>thinkbiganalytics.com
>+1-312-339-1330



Re: [VOTE] Apache Hive 0.10.0 Release Candidate 0

2013-01-10 Thread Ashutosh Chauhan
With 3 binding +1s and two non-bindings +1s, vote passed.

Thanks Alan, Namit, Alex and Carl for your time on testing and voting on
RC. Much Appreciated.

I will push out the artifacts. I will send a note on list announcing the
release once these artifacts are available.

Thanks,
Ashutosh


On Wed, Jan 9, 2013 at 11:02 AM, Carl Steinbach  wrote:

> +1 (binding)
>
> - ran tests
> - worked through some tutorial exercises
>
>
> On Tue, Jan 8, 2013 at 10:56 PM, Alexander Alten-Lorenz <
> wget.n...@gmail.com
> > wrote:
>
> > +1 (non binding)
> > - ran build, tests
> > - sample queries ran too
> >
> > Thanks for driving this,
> >  Alex
> >
> > On Jan 9, 2013, at 6:19 AM, Namit Jain  wrote:
> >
> > > +1
> > >
> > > Build from src.
> > > Ran some sanity tests both from bin and compiled src - they looked good
> > >
> > >
> > >
> > > On 12/27/12 3:08 AM, "Ashutosh Chauhan" 
> > wrote:
> > >
> > >> +1
> > >> Built from sources. Ran unit tests. All looked good.
> > >>
> > >> Thanks,
> > >> Ashutosh
> > >>
> > >>
> > >> On Fri, Dec 21, 2012 at 2:25 PM, Alan Gates 
> > wrote:
> > >>
> > >>> +1 (non-binding)
> > >>> Checked the check sums and key signatures.  Installed it and ran a
> few
> > >>> queries.  All looked good.
> > >>>
> > >>> As a note Hive should be offering a src only release and a
> convenience
> > >>> binary rather than two binaries, one with the source and one without.
> > >>> See
> > >>> the thread on general@incubator discussing this:
> > >>>
> > >>>
> >
> http://mail-archives.apache.org/mod_mbox/incubator-general/201203.mbox/%3
> > >>> CCAOFYJNY%3DEjVHrWVvAedR3OKwCv-BkTaCbEu0ufp7OZR_gpCTiA%
> > 40mail.gmail.com%3
> > >>> E I think this can be solved later and need not block this release.
> > >>>
> > >>> Alan.
> > >>>
> > >>> On Dec 18, 2012, at 10:23 PM, Ashutosh Chauhan wrote:
> > >>>
> >  Apache Hive 0.10.0 Release Candidate 0 is available here:
> >  http://people.apache.org/~hashutosh/hive-0.10.0-rc0/
> > 
> >  Maven artifacts are available here:
> > 
> > >>>
> > >>>
> >
> https://repository.apache.org/content/repositories/orgapachehive-049/org/
> > >>> apache/hive/
> > 
> > 
> >  Release notes are available at:
> > 
> > >>>
> > >>>
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12320745&s
> > >>>
> > tyleName=Text&projectId=12310843&Create=Create&atl_token=A5KQ-2QAV-T4JA-F
> > >>> DED%7C70f39c6dd3cf337eaa0e3a0359687cf608903879%7Clin
> > 
> > 
> >  Voting will conclude in 72 hours.
> > 
> >  Hive PMC Members: Please test and vote.
> > 
> >  Thanks,
> > 
> >  Ashutosh
> > >>>
> > >>>
> > >
> >
> > --
> > Alexander Alten-Lorenz
> > http://mapredit.blogspot.com
> > German Hadoop LinkedIn Group: http://goo.gl/N8pCF
> >
> >
>


Re: unicode character as delimiter

2013-01-10 Thread Dean Wampler
You have to use the octal representation, e.g., ^A is \001.

On Wed, Jan 9, 2013 at 8:32 PM, Ho Kenneth - kennho
wrote:

> Hi all,
>
> I have an input file that has a unicode character as a delimiter, which is
> þ  (thorn)
>
> For example:
>
> col1þcol2þcol3
>
>   Þ has a value of UTF-8(hex) 0xC3 0xBE (c3be)
>
> And I have tried the following but no luck:
> create table test(col1 string, col2 string, col3 string) row format
> delimited fields terminated by '\c3be';
>
> I'd appreciate your help! Thanks in advance.
>
> --ken
>
>
>
> ***
> The information contained in this communication is confidential, is
> intended only for the use of the recipient named above, and may be legally
> privileged.
>
> If the reader of this message is not the intended recipient, you are
> hereby notified that any dissemination, distribution or copying of this
> communication is strictly prohibited.
>
> If you have received this communication in error, please resend this
> communication to the sender and delete the original message or any copy
> of it from your computer system.
>
> Thank You.
>
> 
>



-- 
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330


[jira] [Commented] (HIVE-3874) Create a new Optimized Row Columnar file format for Hive

2013-01-10 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13549559#comment-13549559
 ] 

Namit Jain commented on HIVE-3874:
--

[~owen.omalley], what are your thoughts on Trevni ? From the ppt. ORC strictly 
looks better than Trenvi.
Should we focus more on ORC in that case.

> Create a new Optimized Row Columnar file format for Hive
> 
>
> Key: HIVE-3874
> URL: https://issues.apache.org/jira/browse/HIVE-3874
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: OrcFileIntro.pptx
>
>
> There are several limitations of the current RC File format that I'd like to 
> address by creating a new format:
> * each column value is stored as a binary blob, which means:
> ** the entire column value must be read, decompressed, and deserialized
> ** the file format can't use smarter type-specific compression
> ** push down filters can't be evaluated
> * the start of each row group needs to be found by scanning
> * user metadata can only be added to the file when the file is created
> * the file doesn't store the number of rows per a file or row group
> * there is no mechanism for seeking to a particular row number, which is 
> required for external indexes.
> * there is no mechanism for storing light weight indexes within the file to 
> enable push-down filters to skip entire row groups.
> * the type of the rows aren't stored in the file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3874) Create a new Optimized Row Columnar file format for Hive

2013-01-10 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13549556#comment-13549556
 ] 

Namit Jain commented on HIVE-3874:
--


Can the index be made optional ? In our typical use-case, the old data is 
hardly queried - so we are willing to trade off cpu, and not
support skipping rows for old data to save some space.

These are not v1 requirements, but might be good to have.

> Create a new Optimized Row Columnar file format for Hive
> 
>
> Key: HIVE-3874
> URL: https://issues.apache.org/jira/browse/HIVE-3874
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: OrcFileIntro.pptx
>
>
> There are several limitations of the current RC File format that I'd like to 
> address by creating a new format:
> * each column value is stored as a binary blob, which means:
> ** the entire column value must be read, decompressed, and deserialized
> ** the file format can't use smarter type-specific compression
> ** push down filters can't be evaluated
> * the start of each row group needs to be found by scanning
> * user metadata can only be added to the file when the file is created
> * the file doesn't store the number of rows per a file or row group
> * there is no mechanism for seeking to a particular row number, which is 
> required for external indexes.
> * there is no mechanism for storing light weight indexes within the file to 
> enable push-down filters to skip entire row groups.
> * the type of the rows aren't stored in the file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2439) Upgrade antlr version to 3.4

2013-01-10 Thread Thiruvel Thirumoolan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13549555#comment-13549555
 ] 

Thiruvel Thirumoolan commented on HIVE-2439:


I am also working on running HCatalog unit tests with 0.23 and it needs hive 
jars with this fix [as hcat depends on pig10 which in turn depends on antlr 
3.4]. Currently I push hive jars locally for testing. But would be nice to see 
this in.

> Upgrade antlr version to 3.4
> 
>
> Key: HIVE-2439
> URL: https://issues.apache.org/jira/browse/HIVE-2439
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 0.8.0
>Reporter: Ashutosh Chauhan
>Assignee: Thiruvel Thirumoolan
> Fix For: 0.10.0, 0.9.1, 0.11.0
>
> Attachments: HIVE-2439_branch9_2.patch, HIVE-2439_branch9_3.patch, 
> HIVE-2439_branch9.patch, hive-2439_incomplete.patch, HIVE-2439_trunk.patch
>
>
> Upgrade antlr version to 3.4

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3873) lot of tests failing for hadoop 23

2013-01-10 Thread Thiruvel Thirumoolan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13549547#comment-13549547
 ] 

Thiruvel Thirumoolan commented on HIVE-3873:


Some of the tests also fail when hive-10 is tested with 0.23.4.

The following also fail in 0.10 and have a patch for branch 10 if required. 
Reasons vary from output in a different order (for queries without order by) or 
changes in test cases.
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_filters_overlap
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_nullsafe
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_query_multiskew_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_query_oneskew_3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parenthesis_star_by
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_recursive_dir
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_table_access_keys_stats


udaf_percentile_approx on branch 10 fail with a data quality issue, checking 
that out.
[junit] 20c20
[junit] < 254.083331
[junit] ---
[junit] > 252.77
[junit] 47c47
[junit] < 254.083331
[junit] ---
[junit] > 252.77
[junit] 74c74
[junit] < [23.358,254.083331,477.0625,489.54667]
[junit] ---
[junit] > [24.07,252.77,476.9,487.82]
[junit] 101c101
[junit] < [23.358,254.083331,477.0625,489.54667]
[junit] ---
[junit] > [24.07,252.77,476.9,487.82]

Archive tests also fail on 10, but havent looked at them.

> lot of tests failing for hadoop 23
> --
>
> Key: HIVE-3873
> URL: https://issues.apache.org/jira/browse/HIVE-3873
> Project: Hive
>  Issue Type: Bug
>Reporter: Namit Jain
>Assignee: Gang Tim Liu
>
> The following tests are failing on hadoop 23:
> [junit] Failed query: archive_excludeHadoop20.q
> [junit] Failed query: archive_multi.q
> [junit] Failed query: index_bitmap.q
> [junit] Failed query: join_filters_overlap.q
> [junit] Failed query: join_nullsafe.q
> [junit] Failed query: list_bucket_dml_6.q
> [junit] Failed query: list_bucket_dml_7.q
> [junit] Failed query: list_bucket_dml_8.q
> [junit] Failed query: list_bucket_query_oneskew_3.q
> [junit] Failed query: parenthesis_star_by.q
> [junit] Failed query: recursive_dir.q
> Some of them may be log updates.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Hive-trunk-hadoop2 - Build # 56 - Still Failing

2013-01-10 Thread Apache Jenkins Server
Changes for Build #55

Changes for Build #56
[kevinwilfong] HIVE-3552. performant manner for performing 
cubes/rollups/grouping sets for a high number of grouping set keys.




38 tests failed.
FAILED:  
org.apache.hadoop.hive.cli.TestHBaseNegativeCliDriver.testCliDriver_cascade_dbdrop

Error Message:
Unexpected exception in setup

Stack Trace:
junit.framework.AssertionFailedError: Unexpected exception in setup
at junit.framework.Assert.fail(Assert.java:50)
at 
org.apache.hadoop.hive.cli.TestHBaseNegativeCliDriver.setUp(TestHBaseNegativeCliDriver.java:38)
at junit.framework.TestCase.runBare(TestCase.java:132)
at junit.framework.TestResult$1.protect(TestResult.java:110)
at junit.framework.TestResult.runProtected(TestResult.java:128)
at junit.framework.TestResult.run(TestResult.java:113)
at junit.framework.TestCase.run(TestCase.java:124)
at junit.framework.TestSuite.runTest(TestSuite.java:243)
at junit.framework.TestSuite.run(TestSuite.java:238)
at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24)
at junit.extensions.TestSetup$1.protect(TestSetup.java:23)
at junit.framework.TestResult.runProtected(TestResult.java:128)
at junit.extensions.TestSetup.run(TestSetup.java:27)
at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:518)
at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1052)
at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:906)


FAILED:  
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_grouping_sets4

Error Message:
Unexpected exception
See build/ql/tmp/hive.log, or try "ant test ... -Dtest.silent=false" to get 
more logs.

Stack Trace:
junit.framework.AssertionFailedError: Unexpected exception
See build/ql/tmp/hive.log, or try "ant test ... -Dtest.silent=false" to get 
more logs.
at junit.framework.Assert.fail(Assert.java:50)
at 
org.apache.hadoop.hive.cli.TestCliDriver.runTest(TestCliDriver.java:5610)
at 
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_grouping_sets4(TestCliDriver.java:2375)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at junit.framework.TestCase.runTest(TestCase.java:168)
at junit.framework.TestCase.runBare(TestCase.java:134)
at junit.framework.TestResult$1.protect(TestResult.java:110)
at junit.framework.TestResult.runProtected(TestResult.java:128)
at junit.framework.TestResult.run(TestResult.java:113)
at junit.framework.TestCase.run(TestCase.java:124)
at junit.framework.TestSuite.runTest(TestSuite.java:243)
at junit.framework.TestSuite.run(TestSuite.java:238)
at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:518)
at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1052)
at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:906)


REGRESSION:  org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoin

Error Message:
Unexpected exception
See build/ql/tmp/hive.log, or try "ant test ... -Dtest.silent=false" to get 
more logs.

Stack Trace:
junit.framework.AssertionFailedError: Unexpected exception
See build/ql/tmp/hive.log, or try "ant test ... -Dtest.silent=false" to get 
more logs.
at junit.framework.Assert.fail(Assert.java:50)
at 
org.apache.hadoop.hive.cli.TestCliDriver.runTest(TestCliDriver.java:5610)
at 
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoin(TestCliDriver.java:4183)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at junit.framework.TestCase.runTest(TestCase.java:168)
at junit.framework.TestCase.runBare(TestCase.java:134)
at junit.framework.TestResult$1.protect(TestResult.java:110)
at junit.framework.TestResult.runProtected(TestResult.java:128)
at junit.framework.TestResult.run(TestResult.java:113)
at junit.framework.TestCase.run(TestCase.java:124)
at junit.framework.TestSuite.runTest(TestSuite.java:243)
at junit.framework.TestSuite.run(TestSuite.java:238)
at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:518)
at 
org.apache.tools.ant.taskdefs.optional.junit.

[jira] [Commented] (HIVE-3552) HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys

2013-01-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13549459#comment-13549459
 ] 

Hudson commented on HIVE-3552:
--

Integrated in Hive-trunk-hadoop2 #56 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/56/])
HIVE-3552. performant manner for performing cubes/rollups/grouping sets for 
a high number of grouping set keys. (Revision 1430979)

 Result = FAILURE
kevinwilfong : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1430979
Files : 
* /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
* /hive/trunk/conf/hive-default.xml.template
* /hive/trunk/data/files/grouping_sets1.txt
* /hive/trunk/data/files/grouping_sets2.txt
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
* /hive/trunk/ql/src/test/queries/clientnegative/groupby_grouping_sets6.q
* /hive/trunk/ql/src/test/queries/clientnegative/groupby_grouping_sets7.q
* /hive/trunk/ql/src/test/queries/clientpositive/groupby_grouping_sets2.q
* /hive/trunk/ql/src/test/queries/clientpositive/groupby_grouping_sets3.q
* /hive/trunk/ql/src/test/queries/clientpositive/groupby_grouping_sets4.q
* /hive/trunk/ql/src/test/queries/clientpositive/groupby_grouping_sets5.q
* /hive/trunk/ql/src/test/results/clientnegative/groupby_grouping_sets6.q.out
* /hive/trunk/ql/src/test/results/clientnegative/groupby_grouping_sets7.q.out
* /hive/trunk/ql/src/test/results/clientpositive/groupby_grouping_sets2.q.out
* /hive/trunk/ql/src/test/results/clientpositive/groupby_grouping_sets3.q.out
* /hive/trunk/ql/src/test/results/clientpositive/groupby_grouping_sets4.q.out
* /hive/trunk/ql/src/test/results/clientpositive/groupby_grouping_sets5.q.out
* /hive/trunk/ql/src/test/results/compiler/plan/groupby1.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/groupby2.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/groupby3.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/groupby4.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/groupby5.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/groupby6.q.xml


> HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a 
> high number of grouping set keys
> -
>
> Key: HIVE-3552
> URL: https://issues.apache.org/jira/browse/HIVE-3552
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3552.10.patch, hive.3552.11.patch, 
> hive.3552.12.patch, hive.3552.1.patch, hive.3552.2.patch, hive.3552.3.patch, 
> hive.3552.4.patch, hive.3552.5.patch, hive.3552.6.patch, hive.3552.7.patch, 
> hive.3552.8.patch, hive.3552.9.patch
>
>
> This is a follow up for HIVE-3433.
> Had a offline discussion with Sambavi - she pointed out a scenario where the
> implementation in HIVE-3433 will not scale. Assume that the user is performing
> a cube on many columns, say '8' columns. So, each row would generate 256 rows
> for the hash table, which may kill the current group by implementation.
> A better implementation would be to add an additional mr job - in the first 
> mr job perform the group by assuming there was no cube. Add another mr job, 
> where
> you would perform the cube. The assumption is that the group by would have 
> decreased the output data significantly, and the rows would appear in the 
> order of
> grouping keys which has a higher probability of hitting the hash table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3874) Create a new Optimized Row Columnar file format for Hive

2013-01-10 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13549431#comment-13549431
 ] 

He Yongqiang commented on HIVE-3874:


That should work, just want to make sure they have similar API, so other 
tools/utilities will automatically work, or just needs small changes. One 
example is the block merger.  

> Create a new Optimized Row Columnar file format for Hive
> 
>
> Key: HIVE-3874
> URL: https://issues.apache.org/jira/browse/HIVE-3874
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: OrcFileIntro.pptx
>
>
> There are several limitations of the current RC File format that I'd like to 
> address by creating a new format:
> * each column value is stored as a binary blob, which means:
> ** the entire column value must be read, decompressed, and deserialized
> ** the file format can't use smarter type-specific compression
> ** push down filters can't be evaluated
> * the start of each row group needs to be found by scanning
> * user metadata can only be added to the file when the file is created
> * the file doesn't store the number of rows per a file or row group
> * there is no mechanism for seeking to a particular row number, which is 
> required for external indexes.
> * there is no mechanism for storing light weight indexes within the file to 
> enable push-down filters to skip entire row groups.
> * the type of the rows aren't stored in the file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-1649) Ability to update counters and status from TRANSFORM scripts

2013-01-10 Thread Zhuoluo (Clark) Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhuoluo (Clark) Yang updated HIVE-1649:
---

Affects Version/s: 0.6.0

> Ability to update counters and status from TRANSFORM scripts
> 
>
> Key: HIVE-1649
> URL: https://issues.apache.org/jira/browse/HIVE-1649
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: Carl Steinbach
> Attachments: HIVE-1649.1.patch
>
>
> Hadoop Streaming supports the ability to update counters and status by 
> writing specially coded messages to the script's stderr stream.
> A streaming process can use the stderr to emit counter information. 
> {{reporter:counter:,,}} should be sent to stderr to 
> update the counter.
> A streaming process can use the stderr to emit status information. To set a 
> status, {{reporter:status:}} should be sent to stderr.
> Hive should support these same features with its TRANSFORM mechanism.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Hive-trunk-h0.21 - Build # 1904 - Fixed

2013-01-10 Thread Apache Jenkins Server
Changes for Build #1903

Changes for Build #1904
[kevinwilfong] HIVE-3552. performant manner for performing 
cubes/rollups/grouping sets for a high number of grouping set keys.




All tests passed

The Apache Jenkins build system has built Hive-trunk-h0.21 (build #1904)

Status: Fixed

Check console output at https://builds.apache.org/job/Hive-trunk-h0.21/1904/ to 
view the results.

[jira] [Commented] (HIVE-3552) HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys

2013-01-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13549423#comment-13549423
 ] 

Hudson commented on HIVE-3552:
--

Integrated in Hive-trunk-h0.21 #1904 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/1904/])
HIVE-3552. performant manner for performing cubes/rollups/grouping sets for 
a high number of grouping set keys. (Revision 1430979)

 Result = SUCCESS
kevinwilfong : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1430979
Files : 
* /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
* /hive/trunk/conf/hive-default.xml.template
* /hive/trunk/data/files/grouping_sets1.txt
* /hive/trunk/data/files/grouping_sets2.txt
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
* /hive/trunk/ql/src/test/queries/clientnegative/groupby_grouping_sets6.q
* /hive/trunk/ql/src/test/queries/clientnegative/groupby_grouping_sets7.q
* /hive/trunk/ql/src/test/queries/clientpositive/groupby_grouping_sets2.q
* /hive/trunk/ql/src/test/queries/clientpositive/groupby_grouping_sets3.q
* /hive/trunk/ql/src/test/queries/clientpositive/groupby_grouping_sets4.q
* /hive/trunk/ql/src/test/queries/clientpositive/groupby_grouping_sets5.q
* /hive/trunk/ql/src/test/results/clientnegative/groupby_grouping_sets6.q.out
* /hive/trunk/ql/src/test/results/clientnegative/groupby_grouping_sets7.q.out
* /hive/trunk/ql/src/test/results/clientpositive/groupby_grouping_sets2.q.out
* /hive/trunk/ql/src/test/results/clientpositive/groupby_grouping_sets3.q.out
* /hive/trunk/ql/src/test/results/clientpositive/groupby_grouping_sets4.q.out
* /hive/trunk/ql/src/test/results/clientpositive/groupby_grouping_sets5.q.out
* /hive/trunk/ql/src/test/results/compiler/plan/groupby1.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/groupby2.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/groupby3.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/groupby4.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/groupby5.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/groupby6.q.xml


> HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a 
> high number of grouping set keys
> -
>
> Key: HIVE-3552
> URL: https://issues.apache.org/jira/browse/HIVE-3552
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3552.10.patch, hive.3552.11.patch, 
> hive.3552.12.patch, hive.3552.1.patch, hive.3552.2.patch, hive.3552.3.patch, 
> hive.3552.4.patch, hive.3552.5.patch, hive.3552.6.patch, hive.3552.7.patch, 
> hive.3552.8.patch, hive.3552.9.patch
>
>
> This is a follow up for HIVE-3433.
> Had a offline discussion with Sambavi - she pointed out a scenario where the
> implementation in HIVE-3433 will not scale. Assume that the user is performing
> a cube on many columns, say '8' columns. So, each row would generate 256 rows
> for the hash table, which may kill the current group by implementation.
> A better implementation would be to add an additional mr job - in the first 
> mr job perform the group by assuming there was no cube. Add another mr job, 
> where
> you would perform the cube. The assumption is that the group by would have 
> decreased the output data significantly, and the rows would appear in the 
> order of
> grouping keys which has a higher probability of hitting the hash table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-1649) Ability to update counters and status from TRANSFORM scripts

2013-01-10 Thread Guo Hongjie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guo Hongjie updated HIVE-1649:
--

Attachment: (was: HIVE-305.1.patch)

> Ability to update counters and status from TRANSFORM scripts
> 
>
> Key: HIVE-1649
> URL: https://issues.apache.org/jira/browse/HIVE-1649
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Carl Steinbach
> Attachments: HIVE-1649.1.patch
>
>
> Hadoop Streaming supports the ability to update counters and status by 
> writing specially coded messages to the script's stderr stream.
> A streaming process can use the stderr to emit counter information. 
> {{reporter:counter:,,}} should be sent to stderr to 
> update the counter.
> A streaming process can use the stderr to emit status information. To set a 
> status, {{reporter:status:}} should be sent to stderr.
> Hive should support these same features with its TRANSFORM mechanism.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-1649) Ability to update counters and status from TRANSFORM scripts

2013-01-10 Thread Guo Hongjie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guo Hongjie updated HIVE-1649:
--

Attachment: HIVE-1649.1.patch

> Ability to update counters and status from TRANSFORM scripts
> 
>
> Key: HIVE-1649
> URL: https://issues.apache.org/jira/browse/HIVE-1649
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Carl Steinbach
> Attachments: HIVE-1649.1.patch
>
>
> Hadoop Streaming supports the ability to update counters and status by 
> writing specially coded messages to the script's stderr stream.
> A streaming process can use the stderr to emit counter information. 
> {{reporter:counter:,,}} should be sent to stderr to 
> update the counter.
> A streaming process can use the stderr to emit status information. To set a 
> status, {{reporter:status:}} should be sent to stderr.
> Hive should support these same features with its TRANSFORM mechanism.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-305) Port Hadoop streaming's counters/status reporters to Hive Transforms

2013-01-10 Thread Guo Hongjie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guo Hongjie updated HIVE-305:
-

Release Note: I use the trunk to create this patch .  
http://svn.apache.org/repos/asf/hive/trunk 
  Status: Patch Available  (was: Open)

> Port Hadoop streaming's counters/status reporters to Hive Transforms
> 
>
> Key: HIVE-305
> URL: https://issues.apache.org/jira/browse/HIVE-305
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Venky Iyer
> Attachments: HIVE-305.1.patch
>
>
> https://issues.apache.org/jira/browse/HADOOP-1328
> " Introduced a way for a streaming process to update global counters and 
> status using stderr stream to emit information. Use 
> "reporter:counter:,, " to update  a counter. Use 
> "reporter:status:" to update status. "

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-1649) Ability to update counters and status from TRANSFORM scripts

2013-01-10 Thread Guo Hongjie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guo Hongjie updated HIVE-1649:
--

Attachment: HIVE-305.1.patch

> Ability to update counters and status from TRANSFORM scripts
> 
>
> Key: HIVE-1649
> URL: https://issues.apache.org/jira/browse/HIVE-1649
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Carl Steinbach
> Attachments: HIVE-305.1.patch
>
>
> Hadoop Streaming supports the ability to update counters and status by 
> writing specially coded messages to the script's stderr stream.
> A streaming process can use the stderr to emit counter information. 
> {{reporter:counter:,,}} should be sent to stderr to 
> update the counter.
> A streaming process can use the stderr to emit status information. To set a 
> status, {{reporter:status:}} should be sent to stderr.
> Hive should support these same features with its TRANSFORM mechanism.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-305) Port Hadoop streaming's counters/status reporters to Hive Transforms

2013-01-10 Thread Guo Hongjie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guo Hongjie updated HIVE-305:
-

Attachment: HIVE-305.1.patch

> Port Hadoop streaming's counters/status reporters to Hive Transforms
> 
>
> Key: HIVE-305
> URL: https://issues.apache.org/jira/browse/HIVE-305
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Venky Iyer
> Attachments: HIVE-305.1.patch
>
>
> https://issues.apache.org/jira/browse/HADOOP-1328
> " Introduced a way for a streaming process to update global counters and 
> status using stderr stream to emit information. Use 
> "reporter:counter:,, " to update  a counter. Use 
> "reporter:status:" to update status. "

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-1649) Ability to update counters and status from TRANSFORM scripts

2013-01-10 Thread Guo Hongjie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guo Hongjie updated HIVE-1649:
--

Status: Patch Available  (was: Open)

Port Hadoop streaming's counters/status reporters to Hive Transforms

> Ability to update counters and status from TRANSFORM scripts
> 
>
> Key: HIVE-1649
> URL: https://issues.apache.org/jira/browse/HIVE-1649
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Carl Steinbach
>
> Hadoop Streaming supports the ability to update counters and status by 
> writing specially coded messages to the script's stderr stream.
> A streaming process can use the stderr to emit counter information. 
> {{reporter:counter:,,}} should be sent to stderr to 
> update the counter.
> A streaming process can use the stderr to emit status information. To set a 
> status, {{reporter:status:}} should be sent to stderr.
> Hive should support these same features with its TRANSFORM mechanism.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3874) Create a new Optimized Row Columnar file format for Hive

2013-01-10 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13549416#comment-13549416
 ] 

Vinod Kumar Vavilapalli commented on HIVE-3874:
---

Bumping up the version number for ORC and transparently forwarding old data to 
the current file format should work, no?

> Create a new Optimized Row Columnar file format for Hive
> 
>
> Key: HIVE-3874
> URL: https://issues.apache.org/jira/browse/HIVE-3874
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: OrcFileIntro.pptx
>
>
> There are several limitations of the current RC File format that I'd like to 
> address by creating a new format:
> * each column value is stored as a binary blob, which means:
> ** the entire column value must be read, decompressed, and deserialized
> ** the file format can't use smarter type-specific compression
> ** push down filters can't be evaluated
> * the start of each row group needs to be found by scanning
> * user metadata can only be added to the file when the file is created
> * the file doesn't store the number of rows per a file or row group
> * there is no mechanism for seeking to a particular row number, which is 
> required for external indexes.
> * there is no mechanism for storing light weight indexes within the file to 
> enable push-down filters to skip entire row groups.
> * the type of the rows aren't stored in the file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira