from:"Owen O'Malley \\\(JIRA\\\)"

[jira] [Updated] (HIVE-4359) Remove old versions of the javadoc

2013-04-15 Thread Owen O'Malley (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-4359:


Attachment: h-4359.patch

Combined with 
{code}
% svn rm publish/docs/r0.{3,4,5,6,7,8}.0
{code} 

> Remove old versions of the javadoc
> --
>
> Key: HIVE-4359
> URL: https://issues.apache.org/jira/browse/HIVE-4359
> Project: Hive
>  Issue Type: Task
>  Components: Website
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: h-4359.patch
>
>
> Delete the old versions of the javadoc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HIVE-4359) Remove old versions of the javadoc

2013-04-16 Thread Owen O'Malley (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley resolved HIVE-4359.
-

Resolution: Fixed

I just committed this.

> Remove old versions of the javadoc
> --
>
> Key: HIVE-4359
> URL: https://issues.apache.org/jira/browse/HIVE-4359
> Project: Hive
>  Issue Type: Task
>  Components: Website
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: h-4359.patch
>
>
> Delete the old versions of the javadoc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4189) ORC fails with String column that ends in lots of nulls

2013-04-17 Thread Owen O'Malley (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13634583#comment-13634583
 ] 

Owen O'Malley commented on HIVE-4189:
-

+1 looks good.

> ORC fails with String column that ends in lots of nulls
> ---
>
> Key: HIVE-4189
> URL: https://issues.apache.org/jira/browse/HIVE-4189
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 0.11.0
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
> Attachments: HIVE-4189.1.patch.txt, HIVE-4189.2.patch.txt
>
>
> When ORC attempts to write out a string column that ends in enough nulls to 
> span an index stride, StringTreeWriter's writeStripe method will get an 
> exception from TreeWriter's writeStripe method
> Column has wrong number of index entries found: x expected: y
> This is caused by rowIndexValueCount having multiple entries equal to the 
> number of non-null rows in the column, combined with the fact that 
> StringTreeWriter has special logic for constructing its index.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4305) Use a single system for dependency resolution

2013-04-17 Thread Owen O'Malley (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13634624#comment-13634624
 ] 

Owen O'Malley commented on HIVE-4305:
-

Carl, I fully acknowledge that ant vs maven is a religious discussion. However, 
to back up my five points:

* IDE support is much better

>From http://www.jetbrains.com/idea/features/ant_maven.html :

Maven integration reads the files and builds the modules and dependencies 
between them.
Ant integration executes ant targets.

This is similar to eclipse too. For Maven projects, you don't need to maintain 
a set of helper files that set up the project in the IDE. They can build it 
automatically. Even with our eclipse helper scripts, users give up on building 
Hive in an IDE.

* Offline support is much better

Try turning off the internet and build Hive. It is relatively difficult. Maven 
will just work if you have the required jars in your cache.

* You can download a Maven project and build it without reading the build file.

This is obviously true from the fundamentals of each system. Ant provides a 
wide open playing field and you can build "tar" in one project and "package" in 
another. There are no rules. In Maven, I know what "package" will build.

* Publishing to Maven central is much easier.

Ivy can't publish to Maven central, so you end up use ant's maven tasks to 
publish. This requires that you have two different descriptions of the projects 
dependencies one for ivy and one for ant's maven tasks. Furthermore, based on 
my experience as the release manager for Hadoop, ant's maven tasks are much 
more error-prone. Futhermore, they don't support features like storing your 
password encrypted.

* Profiles work much better in Maven.

Ok, this one is debatable. In my opinion, Maven profiles are cleaner and better 
designed.

Finally, I fully support Brock's point:

* Maven is used by the other Hadoop ecosystem projects.

Hadoop in particular was using ant, ivy, and maven ant tasks for a long time 
and traded them in for Maven. There is significant value in using similar tools.

> Use a single system for dependency resolution
> -
>
> Key: HIVE-4305
> URL: https://issues.apache.org/jira/browse/HIVE-4305
> Project: Hive
>  Issue Type: Improvement
>  Components: Build Infrastructure, HCatalog
>Reporter: Travis Crawford
>
> Both Hive and HCatalog use ant as their build tool. However, Hive uses ivy 
> for dependency resolution while HCatalog uses maven-ant-tasks. With the 
> project merge we should converge on a single tool for dependency resolution.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4305) Use a single system for dependency resolution

2013-04-18 Thread Owen O'Malley (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13635822#comment-13635822
 ] 

Owen O'Malley commented on HIVE-4305:
-

.bq I have good news and bad news.

I have better news, Maven handles offline by just passing in "-o".

> Use a single system for dependency resolution
> -
>
> Key: HIVE-4305
> URL: https://issues.apache.org/jira/browse/HIVE-4305
> Project: Hive
>  Issue Type: Improvement
>  Components: Build Infrastructure, HCatalog
>Reporter: Travis Crawford
>Assignee: Carl Steinbach
>
> Both Hive and HCatalog use ant as their build tool. However, Hive uses ivy 
> for dependency resolution while HCatalog uses maven-ant-tasks. With the 
> project merge we should converge on a single tool for dependency resolution.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4305) Use a single system for dependency resolution

2013-04-18 Thread Owen O'Malley (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13635867#comment-13635867
 ] 

Owen O'Malley commented on HIVE-4305:
-

Carl, the critical point is that you are having to fix the the ant build file 
to make offline work. In Maven, it is built in and thus we have less to 
maintain.

> Use a single system for dependency resolution
> -
>
> Key: HIVE-4305
> URL: https://issues.apache.org/jira/browse/HIVE-4305
> Project: Hive
>  Issue Type: Improvement
>  Components: Build Infrastructure, HCatalog
>Reporter: Travis Crawford
>Assignee: Carl Steinbach
>
> Both Hive and HCatalog use ant as their build tool. However, Hive uses ivy 
> for dependency resolution while HCatalog uses maven-ant-tasks. With the 
> project merge we should converge on a single tool for dependency resolution.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4178) ORC fails with files with different numbers of columns

2013-04-19 Thread Owen O'Malley (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-4178:


   Resolution: Fixed
Fix Version/s: 0.11.0
   Status: Resolved  (was: Patch Available)

I just committed this to trunk and branch-11. Thanks, Kevin!

> ORC fails with files with different numbers of columns
> --
>
> Key: HIVE-4178
> URL: https://issues.apache.org/jira/browse/HIVE-4178
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 0.11.0
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
> Fix For: 0.11.0
>
> Attachments: HIVE-4178.1.patch.txt
>
>
> When CombineHiveInputFormat is used, it's possible that two files with 
> different numbers of files can be included in the same split, in which case 
> Hive will fail at one of several points with an 
> ArrayIndexOutOfBoundsException.
> This can happen when a partition contains empty files or two partitions are 
> read with different numbers of columns.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4305) Use a single system for dependency resolution

2013-04-19 Thread Owen O'Malley (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13636534#comment-13636534
 ] 

Owen O'Malley commented on HIVE-4305:
-

Carl,
  Rather than debate it theoretically or compare it to Hadoop, which has a 
*LOT* more complexity in their build, I propose that we have Travis make a 
Maven build file for the combined Hive and HCat systems. Then we can debate the 
value and issues in the particular patch and how to move the project forward. 
The current state is painful with extremely long builds. We need to move 
forward and enable the project to evolve quickly so that Hive can compete with 
its many comercial competitors.

> Use a single system for dependency resolution
> -
>
> Key: HIVE-4305
> URL: https://issues.apache.org/jira/browse/HIVE-4305
> Project: Hive
>  Issue Type: Improvement
>  Components: Build Infrastructure, HCatalog
>Reporter: Travis Crawford
>Assignee: Carl Steinbach
>
> Both Hive and HCatalog use ant as their build tool. However, Hive uses ivy 
> for dependency resolution while HCatalog uses maven-ant-tasks. With the 
> project merge we should converge on a single tool for dependency resolution.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4189) ORC fails with String column that ends in lots of nulls

2013-04-19 Thread Owen O'Malley (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-4189:


   Resolution: Fixed
Fix Version/s: 0.11.0
   Status: Resolved  (was: Patch Available)

I just committed this to trunk and branch-0.11. Thanks, Kevin!

> ORC fails with String column that ends in lots of nulls
> ---
>
> Key: HIVE-4189
> URL: https://issues.apache.org/jira/browse/HIVE-4189
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 0.11.0
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
> Fix For: 0.11.0
>
> Attachments: HIVE-4189.1.patch.txt, HIVE-4189.2.patch.txt
>
>
> When ORC attempts to write out a string column that ends in enough nulls to 
> span an index stride, StringTreeWriter's writeStripe method will get an 
> exception from TreeWriter's writeStripe method
> Column has wrong number of index entries found: x expected: y
> This is caused by rowIndexValueCount having multiple entries equal to the 
> number of non-null rows in the column, combined with the fact that 
> StringTreeWriter has special logic for constructing its index.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4305) Use a single system for dependency resolution

2013-04-20 Thread Owen O'Malley (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13637405#comment-13637405
 ] 

Owen O'Malley commented on HIVE-4305:
-

{quote}
Owen, please give some concrete examples of things that make Hadoop's build 
more complex than Hive's.
{quote}

* It contains native executables.
* It contains native libraries.
* It contains jni libraries.

{quote}
I think it would be more pragmatic to spend time improving the build that we 
currently have
{quote}

Moving to Maven would be making it better in the opinion of the majority of the 
development community. The current Hive build is a complex mess and Ivy and 
maven ant tasks is really hard to debug.

Certainly, I believe it is possible to make things worse with Maven. I'm not a 
fan of how the Hadoop mavenization was done and I deeply regret not taking the 
time to make it better as it went in, but it was still better than the ant + 
ivy + maven ant tasks that we had. If it hadn't been, it would have been 
rejected. That said, in my experience most projects are better off with Maven 
builds than ant + ivy + maven ant tasks.



> Use a single system for dependency resolution
> -
>
> Key: HIVE-4305
> URL: https://issues.apache.org/jira/browse/HIVE-4305
> Project: Hive
>  Issue Type: Improvement
>  Components: Build Infrastructure, HCatalog
>Reporter: Travis Crawford
>Assignee: Carl Steinbach
>
> Both Hive and HCatalog use ant as their build tool. However, Hive uses ivy 
> for dependency resolution while HCatalog uses maven-ant-tasks. With the 
> project merge we should converge on a single tool for dependency resolution.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-4421) Improve memory usage by ORC dictionaries

2013-04-24 Thread Owen O'Malley (JIRA)

Owen O'Malley created HIVE-4421:
---

 Summary: Improve memory usage by ORC dictionaries
 Key: HIVE-4421
 URL: https://issues.apache.org/jira/browse/HIVE-4421
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Owen O'Malley
Assignee: Owen O'Malley


Currently, for tables with many string columns, it is possible to significantly 
underestimate the memory used by the ORC dictionaries and cause the query to 
run out of memory in the task. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4421) Improve memory usage by ORC dictionaries

2013-04-25 Thread Owen O'Malley (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-4421:


Fix Version/s: 0.11.0
   Status: Patch Available  (was: Open)

This patch does three things:
* Improves the memory usage while writing ORC dictionaries by removing the 
counts and just storing offsets instead of offsets and lengths.
* Improves the tracking of how much memory is used by the dictionaries by 
tracking the allocation rather than the usage.
* Reduces the size of some of the allocation sizes of the integer arrays.

> Improve memory usage by ORC dictionaries
> 
>
> Key: HIVE-4421
> URL: https://issues.apache.org/jira/browse/HIVE-4421
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Fix For: 0.11.0
>
> Attachments: HIVE-4421.D10545.1.patch
>
>
> Currently, for tables with many string columns, it is possible to 
> significantly underestimate the memory used by the ORC dictionaries and cause 
> the query to run out of memory in the task. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-4464) Hive's JDBC module doesn't compile under openjdk 7

2013-04-30 Thread Owen O'Malley (JIRA)

Owen O'Malley created HIVE-4464:
---

 Summary: Hive's JDBC module doesn't compile under openjdk 7
 Key: HIVE-4464
 URL: https://issues.apache.org/jira/browse/HIVE-4464
 Project: Hive
  Issue Type: Task
Reporter: Owen O'Malley
Assignee: Owen O'Malley


Hive currently fails to compile when compiled with openjdk 7.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-8880) non-synchronized access to split list in OrcInputFormat

2014-12-05 Thread Owen O'Malley (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14236177#comment-14236177
 ] 

Owen O'Malley commented on HIVE-8880:
-

+1, this is good.

> non-synchronized access to split list in OrcInputFormat
> ---
>
> Key: HIVE-8880
> URL: https://issues.apache.org/jira/browse/HIVE-8880
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: Alan Gates
>Assignee: Alan Gates
> Fix For: 0.14.1
>
> Attachments: HIVE-8880.patch
>
>
> When adding delta files to the list of orc splits access to the list is not 
> synchronized though it is shared across threads.  All other additions to the 
> list are synchronized.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8966) Delta files created by hive hcatalog streaming cannot be compacted

2014-12-09 Thread Owen O'Malley (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14240415#comment-14240415
 ] 

Owen O'Malley commented on HIVE-8966:
-

Alan, your patch looks good +1

> Delta files created by hive hcatalog streaming cannot be compacted
> --
>
> Key: HIVE-8966
> URL: https://issues.apache.org/jira/browse/HIVE-8966
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 0.14.0
> Environment: hive
>Reporter: Jihong Liu
>Assignee: Alan Gates
>Priority: Critical
> Fix For: 0.14.1
>
> Attachments: HIVE-8966.2.patch, HIVE-8966.patch
>
>
> hive hcatalog streaming will also create a file like bucket_n_flush_length in 
> each delta directory. Where "n" is the bucket number. But the 
> compactor.CompactorMR think this file also needs to compact. However this 
> file of course cannot be compacted, so compactor.CompactorMR will not 
> continue to do the compaction. 
> Did a test, after removed the bucket_n_flush_length file, then the "alter 
> table partition compact" finished successfully. If don't delete that file, 
> nothing will be compacted. 
> This is probably a very severity bug. Both 0.13 and 0.14 have this issue



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9166) Place an upper bound for SARG CNF conversion

2014-12-18 Thread Owen O'Malley (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252406#comment-14252406
 ] 

Owen O'Malley commented on HIVE-9166:
-

+1 LGTM

You probably should add a test case where there is something other than the 
large CNF.

something like (and leaf-1 (or ...))

You should end up with leaf-1 as your final expression.



> Place an upper bound for SARG CNF conversion
> 
>
> Key: HIVE-9166
> URL: https://issues.apache.org/jira/browse/HIVE-9166
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0, 0.15.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>  Labels: orcfile
> Attachments: HIVE-9166.1.patch, HIVE-9166.2.patch
>
>
> SARG creation in ORC, applies several optimizations to expression tree. In 
> that CNF conversion is an exponential algorithm as it finds all combinations 
> of expressions when converting from OR of AND form to AND of OR form (CNF). 
> We need an upper bound for this algorithm to prevent it from running for long 
> time and generating huge combinations list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9188) BloomFilter in ORC row group index

2015-01-07 Thread Owen O'Malley (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14267993#comment-14267993
 ] 

Owen O'Malley commented on HIVE-9188:
-

I'm concerned about the size of the bloom filters and making them an integrated 
part of the column statistics. I think we'd do much better to make a 
BLOOM_FILTER stream kind and place them in a completely separate stream. That 
would allow the predicate push down to only load the bloom filters for the 
columns that it needs.

> BloomFilter in ORC row group index
> --
>
> Key: HIVE-9188
> URL: https://issues.apache.org/jira/browse/HIVE-9188
> Project: Hive
>  Issue Type: New Feature
>  Components: File Formats
>Affects Versions: 0.15.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>  Labels: orcfile
> Attachments: HIVE-9188.1.patch, HIVE-9188.2.patch, HIVE-9188.3.patch, 
> HIVE-9188.4.patch
>
>
> BloomFilters are well known probabilistic data structure for set membership 
> checking. We can use bloom filters in ORC index for better row group pruning. 
> Currently, ORC row group index uses min/max statistics to eliminate row 
> groups (stripes as well) that do not satisfy predicate condition specified in 
> the query. But in some cases, the efficiency of min/max based elimination is 
> not optimal (unsorted columns with wide range of entries). Bloom filters can 
> be an effective and efficient alternative for row group/split elimination for 
> point queries or queries with IN clause.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-4639) Add has null flag to ORC internal index

2015-01-07 Thread Owen O'Malley (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14268053#comment-14268053
 ] 

Owen O'Malley commented on HIVE-4639:
-

You should encode four values:
  no_values, all_nulls, some_nulls, no_nulls

This will allow you to support a richer set of sargs.

> Add has null flag to ORC internal index
> ---
>
> Key: HIVE-4639
> URL: https://issues.apache.org/jira/browse/HIVE-4639
> Project: Hive
>  Issue Type: Improvement
>  Components: File Formats
>Reporter: Owen O'Malley
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-4639.1.patch
>
>
> It would enable more predicate pushdown if we added a flag to the index entry 
> recording if there were any null values in the column for the 10k rows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9188) BloomFilter in ORC row group index

2015-01-07 Thread Owen O'Malley (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14268176#comment-14268176
 ] 

Owen O'Malley commented on HIVE-9188:
-

[~gopalv] I don't understand your concern. The indexes are already stored in 
ROW_INDEX streams. I'm just saying that the bloom filters, which are much 
larger than the rest of the ROW_INDEX be split into a BLOOM_FILTER stream 
instead of bundled in with the ROW_INDEX stream. That would let you load just 
the ROW_INDEX if you don't need the bloom filter.

The size of the bloom filter needs to be changed relative to the number of 
items. You've sized them for the default row group size (n = 10,000, p=0.05) -> 
7.8kb. To use them at the file level, you'd need to make the bloom filters much 
much much larger. For a file with 100 million values in a column, you'd need a 
74mb bloom filter. I'd propose that you only do the bloom filters at the row 
group level and scale them to match the row index stride rather than just use 
the default 10k.

> BloomFilter in ORC row group index
> --
>
> Key: HIVE-9188
> URL: https://issues.apache.org/jira/browse/HIVE-9188
> Project: Hive
>  Issue Type: New Feature
>  Components: File Formats
>Affects Versions: 0.15.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>  Labels: orcfile
> Attachments: HIVE-9188.1.patch, HIVE-9188.2.patch, HIVE-9188.3.patch, 
> HIVE-9188.4.patch
>
>
> BloomFilters are well known probabilistic data structure for set membership 
> checking. We can use bloom filters in ORC index for better row group pruning. 
> Currently, ORC row group index uses min/max statistics to eliminate row 
> groups (stripes as well) that do not satisfy predicate condition specified in 
> the query. But in some cases, the efficiency of min/max based elimination is 
> not optimal (unsorted columns with wide range of entries). Bloom filters can 
> be an effective and efficient alternative for row group/split elimination for 
> point queries or queries with IN clause.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9188) BloomFilter in ORC row group index

2015-01-07 Thread Owen O'Malley (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14268573#comment-14268573
 ] 

Owen O'Malley commented on HIVE-9188:
-

[~prasanth_j] Ok, I thought that you said that you were going to have bloom 
filters at row group, stripe, and file level. I agree completely that ORC 
should only have bloom filters at the row group level.

Having the bloom filter as a separate stream means the reader does *far* less 
IO. It will still go through the code that merges adjacent ranges together into 
a single read. So if you need all of the indexes and bloom filters for all of 
the columns the reader should read them in a single IO operation. On the other 
hand, if it doesn't need any bloom filter it shouldn't have to load the extra 
mb of data it doesn't need.

> BloomFilter in ORC row group index
> --
>
> Key: HIVE-9188
> URL: https://issues.apache.org/jira/browse/HIVE-9188
> Project: Hive
>  Issue Type: New Feature
>  Components: File Formats
>Affects Versions: 0.15.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>  Labels: orcfile
> Attachments: HIVE-9188.1.patch, HIVE-9188.2.patch, HIVE-9188.3.patch, 
> HIVE-9188.4.patch
>
>
> BloomFilters are well known probabilistic data structure for set membership 
> checking. We can use bloom filters in ORC index for better row group pruning. 
> Currently, ORC row group index uses min/max statistics to eliminate row 
> groups (stripes as well) that do not satisfy predicate condition specified in 
> the query. But in some cases, the efficiency of min/max based elimination is 
> not optimal (unsorted columns with wide range of entries). Bloom filters can 
> be an effective and efficient alternative for row group/split elimination for 
> point queries or queries with IN clause.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-9317) move Microsoft copyright to NOTICE file

2015-01-08 Thread Owen O'Malley (JIRA)

Owen O'Malley created HIVE-9317:
---

 Summary: move Microsoft copyright to NOTICE file
 Key: HIVE-9317
 URL: https://issues.apache.org/jira/browse/HIVE-9317
 Project: Hive
  Issue Type: Bug
Reporter: Owen O'Malley
 Fix For: 0.15.0


There are a set of files that still have the Microsoft copyright notices. Those 
notices need to be moved into NOTICES and replaced with the standard Apache 
headers.

{code}
./common/src/java/org/apache/hadoop/hive/common/type/Decimal128.java
./common/src/java/org/apache/hadoop/hive/common/type/SignedInt128.java
./common/src/java/org/apache/hadoop/hive/common/type/SqlMathUtil.java
./common/src/java/org/apache/hadoop/hive/common/type/UnsignedInt128.java
./common/src/test/org/apache/hadoop/hive/common/type/TestDecimal128.java
./common/src/test/org/apache/hadoop/hive/common/type/TestSignedInt128.java
./common/src/test/org/apache/hadoop/hive/common/type/TestSqlMathUtil.java
./common/src/test/org/apache/hadoop/hive/common/type/TestUnsignedInt128.java
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9188) BloomFilter in ORC row group index

2015-01-13 Thread Owen O'Malley (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14275997#comment-14275997
 ] 

Owen O'Malley commented on HIVE-9188:
-

[~prasanth_j] Please remove the upper two levels of bloom filters. They are 
utterly useless. Their false positive rate will be far above 99%.

They absolutely should not be stored in the column statistics. That will hurt 
the common ppd case and not help.

> BloomFilter in ORC row group index
> --
>
> Key: HIVE-9188
> URL: https://issues.apache.org/jira/browse/HIVE-9188
> Project: Hive
>  Issue Type: New Feature
>  Components: File Formats
>Affects Versions: 0.15.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>  Labels: orcfile
> Attachments: HIVE-9188.1.patch, HIVE-9188.2.patch, HIVE-9188.3.patch, 
> HIVE-9188.4.patch
>
>
> BloomFilters are well known probabilistic data structure for set membership 
> checking. We can use bloom filters in ORC index for better row group pruning. 
> Currently, ORC row group index uses min/max statistics to eliminate row 
> groups (stripes as well) that do not satisfy predicate condition specified in 
> the query. But in some cases, the efficiency of min/max based elimination is 
> not optimal (unsorted columns with wide range of entries). Bloom filters can 
> be an effective and efficient alternative for row group/split elimination for 
> point queries or queries with IN clause.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8966) Delta files created by hive hcatalog streaming cannot be compacted

2015-01-20 Thread Owen O'Malley (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14284927#comment-14284927
 ] 

Owen O'Malley commented on HIVE-8966:
-

This looks good, Alan. +1

One minor nit is that the class javadoc for ValidReadTxnList has "And" instead 
of the intended "An".


> Delta files created by hive hcatalog streaming cannot be compacted
> --
>
> Key: HIVE-8966
> URL: https://issues.apache.org/jira/browse/HIVE-8966
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 0.14.0
> Environment: hive
>Reporter: Jihong Liu
>Assignee: Alan Gates
>Priority: Critical
> Fix For: 0.14.1
>
> Attachments: HIVE-8966.2.patch, HIVE-8966.3.patch, HIVE-8966.4.patch, 
> HIVE-8966.5.patch, HIVE-8966.patch
>
>
> hive hcatalog streaming will also create a file like bucket_n_flush_length in 
> each delta directory. Where "n" is the bucket number. But the 
> compactor.CompactorMR think this file also needs to compact. However this 
> file of course cannot be compacted, so compactor.CompactorMR will not 
> continue to do the compaction. 
> Did a test, after removed the bucket_n_flush_length file, then the "alter 
> table partition compact" finished successfully. If don't delete that file, 
> nothing will be compacted. 
> This is probably a very severity bug. Both 0.13 and 0.14 have this issue



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8966) Delta files created by hive hcatalog streaming cannot be compacted

2015-01-20 Thread Owen O'Malley (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14284935#comment-14284935
 ] 

Owen O'Malley commented on HIVE-8966:
-

After a little more thought, I'm worried that someone will accidentally create 
a ValidCompactorTxnList and get confused by the different behavior. I think it 
would make sense to move it into the compactor package to minimize the chance 
that someone accidentally uses it by mistake. 

> Delta files created by hive hcatalog streaming cannot be compacted
> --
>
> Key: HIVE-8966
> URL: https://issues.apache.org/jira/browse/HIVE-8966
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 0.14.0
> Environment: hive
>Reporter: Jihong Liu
>Assignee: Alan Gates
>Priority: Critical
> Fix For: 0.14.1
>
> Attachments: HIVE-8966.2.patch, HIVE-8966.3.patch, HIVE-8966.4.patch, 
> HIVE-8966.5.patch, HIVE-8966.patch
>
>
> hive hcatalog streaming will also create a file like bucket_n_flush_length in 
> each delta directory. Where "n" is the bucket number. But the 
> compactor.CompactorMR think this file also needs to compact. However this 
> file of course cannot be compacted, so compactor.CompactorMR will not 
> continue to do the compaction. 
> Did a test, after removed the bucket_n_flush_length file, then the "alter 
> table partition compact" finished successfully. If don't delete that file, 
> nothing will be compacted. 
> This is probably a very severity bug. Both 0.13 and 0.14 have this issue



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-9451) Add max size of column dictionaries to ORC metadata

2015-01-23 Thread Owen O'Malley (JIRA)

Owen O'Malley created HIVE-9451:
---

 Summary: Add max size of column dictionaries to ORC metadata
 Key: HIVE-9451
 URL: https://issues.apache.org/jira/browse/HIVE-9451
 Project: Hive
  Issue Type: Improvement
Reporter: Owen O'Malley


To predict the amount of memory required to read an ORC file we need to know 
the size of the dictionaries for the columns that we are reading. I propose 
adding the number of bytes for each column's dictionary to the stripe's column 
statistics. The file's column statistics would have the maximum dictionary size 
for each column.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-9467) ORC - sort dictionary streams to the end of the stripe

2015-01-26 Thread Owen O'Malley (JIRA)

Owen O'Malley created HIVE-9467:
---

 Summary: ORC - sort dictionary streams to the end of the stripe
 Key: HIVE-9467
 URL: https://issues.apache.org/jira/browse/HIVE-9467
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Owen O'Malley


When reading ORC files, it would be convenient to group the dictionary streams 
at the end of the stripe. This would allow the reader to use fewer read 
operations if they want to load the dictionaries before they load the data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9317) move Microsoft copyright to NOTICE file

2015-01-26 Thread Owen O'Malley (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-9317:

Attachment: hive-9327.txt

This patch changes no code, just puts the required Apache header on the source 
files and moves Microsoft's copyright notice to the NOTICE file.

> move Microsoft copyright to NOTICE file
> ---
>
> Key: HIVE-9317
> URL: https://issues.apache.org/jira/browse/HIVE-9317
> Project: Hive
>  Issue Type: Bug
>Reporter: Owen O'Malley
> Fix For: 0.15.0
>
> Attachments: hive-9327.txt
>
>
> There are a set of files that still have the Microsoft copyright notices. 
> Those notices need to be moved into NOTICES and replaced with the standard 
> Apache headers.
> {code}
> ./common/src/java/org/apache/hadoop/hive/common/type/Decimal128.java
> ./common/src/java/org/apache/hadoop/hive/common/type/SignedInt128.java
> ./common/src/java/org/apache/hadoop/hive/common/type/SqlMathUtil.java
> ./common/src/java/org/apache/hadoop/hive/common/type/UnsignedInt128.java
> ./common/src/test/org/apache/hadoop/hive/common/type/TestDecimal128.java
> ./common/src/test/org/apache/hadoop/hive/common/type/TestSignedInt128.java
> ./common/src/test/org/apache/hadoop/hive/common/type/TestSqlMathUtil.java
> ./common/src/test/org/apache/hadoop/hive/common/type/TestUnsignedInt128.java
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9317) move Microsoft copyright to NOTICE file

2015-01-26 Thread Owen O'Malley (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-9317:

Priority: Blocker  (was: Major)

> move Microsoft copyright to NOTICE file
> ---
>
> Key: HIVE-9317
> URL: https://issues.apache.org/jira/browse/HIVE-9317
> Project: Hive
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>Priority: Blocker
> Fix For: 0.15.0
>
> Attachments: hive-9327.txt
>
>
> There are a set of files that still have the Microsoft copyright notices. 
> Those notices need to be moved into NOTICES and replaced with the standard 
> Apache headers.
> {code}
> ./common/src/java/org/apache/hadoop/hive/common/type/Decimal128.java
> ./common/src/java/org/apache/hadoop/hive/common/type/SignedInt128.java
> ./common/src/java/org/apache/hadoop/hive/common/type/SqlMathUtil.java
> ./common/src/java/org/apache/hadoop/hive/common/type/UnsignedInt128.java
> ./common/src/test/org/apache/hadoop/hive/common/type/TestDecimal128.java
> ./common/src/test/org/apache/hadoop/hive/common/type/TestSignedInt128.java
> ./common/src/test/org/apache/hadoop/hive/common/type/TestSqlMathUtil.java
> ./common/src/test/org/apache/hadoop/hive/common/type/TestUnsignedInt128.java
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9317) move Microsoft copyright to NOTICE file

2015-01-26 Thread Owen O'Malley (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-9317:

Status: Patch Available  (was: Open)

> move Microsoft copyright to NOTICE file
> ---
>
> Key: HIVE-9317
> URL: https://issues.apache.org/jira/browse/HIVE-9317
> Project: Hive
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>Priority: Blocker
> Fix For: 0.15.0
>
> Attachments: hive-9327.txt
>
>
> There are a set of files that still have the Microsoft copyright notices. 
> Those notices need to be moved into NOTICES and replaced with the standard 
> Apache headers.
> {code}
> ./common/src/java/org/apache/hadoop/hive/common/type/Decimal128.java
> ./common/src/java/org/apache/hadoop/hive/common/type/SignedInt128.java
> ./common/src/java/org/apache/hadoop/hive/common/type/SqlMathUtil.java
> ./common/src/java/org/apache/hadoop/hive/common/type/UnsignedInt128.java
> ./common/src/test/org/apache/hadoop/hive/common/type/TestDecimal128.java
> ./common/src/test/org/apache/hadoop/hive/common/type/TestSignedInt128.java
> ./common/src/test/org/apache/hadoop/hive/common/type/TestSqlMathUtil.java
> ./common/src/test/org/apache/hadoop/hive/common/type/TestUnsignedInt128.java
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-9317) move Microsoft copyright to NOTICE file

2015-01-26 Thread Owen O'Malley (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley reassigned HIVE-9317:
---

Assignee: Owen O'Malley

> move Microsoft copyright to NOTICE file
> ---
>
> Key: HIVE-9317
> URL: https://issues.apache.org/jira/browse/HIVE-9317
> Project: Hive
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Fix For: 0.15.0
>
> Attachments: hive-9327.txt
>
>
> There are a set of files that still have the Microsoft copyright notices. 
> Those notices need to be moved into NOTICES and replaced with the standard 
> Apache headers.
> {code}
> ./common/src/java/org/apache/hadoop/hive/common/type/Decimal128.java
> ./common/src/java/org/apache/hadoop/hive/common/type/SignedInt128.java
> ./common/src/java/org/apache/hadoop/hive/common/type/SqlMathUtil.java
> ./common/src/java/org/apache/hadoop/hive/common/type/UnsignedInt128.java
> ./common/src/test/org/apache/hadoop/hive/common/type/TestDecimal128.java
> ./common/src/test/org/apache/hadoop/hive/common/type/TestSignedInt128.java
> ./common/src/test/org/apache/hadoop/hive/common/type/TestSqlMathUtil.java
> ./common/src/test/org/apache/hadoop/hive/common/type/TestUnsignedInt128.java
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9317) move Microsoft copyright to NOTICE file

2015-01-28 Thread Owen O'Malley (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-9317:

   Resolution: Fixed
Fix Version/s: 1.0.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

I committed this. Thanks for the review, Alan.

> move Microsoft copyright to NOTICE file
> ---
>
> Key: HIVE-9317
> URL: https://issues.apache.org/jira/browse/HIVE-9317
> Project: Hive
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>Priority: Blocker
> Fix For: 0.15.0, 1.0.0
>
> Attachments: hive-9327.txt
>
>
> There are a set of files that still have the Microsoft copyright notices. 
> Those notices need to be moved into NOTICES and replaced with the standard 
> Apache headers.
> {code}
> ./common/src/java/org/apache/hadoop/hive/common/type/Decimal128.java
> ./common/src/java/org/apache/hadoop/hive/common/type/SignedInt128.java
> ./common/src/java/org/apache/hadoop/hive/common/type/SqlMathUtil.java
> ./common/src/java/org/apache/hadoop/hive/common/type/UnsignedInt128.java
> ./common/src/test/org/apache/hadoop/hive/common/type/TestDecimal128.java
> ./common/src/test/org/apache/hadoop/hive/common/type/TestSignedInt128.java
> ./common/src/test/org/apache/hadoop/hive/common/type/TestSqlMathUtil.java
> ./common/src/test/org/apache/hadoop/hive/common/type/TestUnsignedInt128.java
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9451) Add max size of column dictionaries to ORC metadata

2015-01-29 Thread Owen O'Malley (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14297178#comment-14297178
 ] 

Owen O'Malley commented on HIVE-9451:
-

We should also record the stripe size that was used as the file was written. 
That gives a strict upper bound on the size of memory in the writer.

> Add max size of column dictionaries to ORC metadata
> ---
>
> Key: HIVE-9451
> URL: https://issues.apache.org/jira/browse/HIVE-9451
> Project: Hive
>  Issue Type: Improvement
>Reporter: Owen O'Malley
>
> To predict the amount of memory required to read an ORC file we need to know 
> the size of the dictionaries for the columns that we are reading. I propose 
> adding the number of bytes for each column's dictionary to the stripe's 
> column statistics. The file's column statistics would have the maximum 
> dictionary size for each column.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9317) move Microsoft copyright to NOTICE file

2015-01-29 Thread Owen O'Malley (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14297319#comment-14297319
 ] 

Owen O'Malley commented on HIVE-9317:
-

+1 to not rolling a new RC specifically for this one. I just want to make sure 
it goes into to any new RCs.

> move Microsoft copyright to NOTICE file
> ---
>
> Key: HIVE-9317
> URL: https://issues.apache.org/jira/browse/HIVE-9317
> Project: Hive
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>Priority: Blocker
> Fix For: 0.15.0, 1.0.0
>
> Attachments: hive-9327.txt
>
>
> There are a set of files that still have the Microsoft copyright notices. 
> Those notices need to be moved into NOTICES and replaced with the standard 
> Apache headers.
> {code}
> ./common/src/java/org/apache/hadoop/hive/common/type/Decimal128.java
> ./common/src/java/org/apache/hadoop/hive/common/type/SignedInt128.java
> ./common/src/java/org/apache/hadoop/hive/common/type/SqlMathUtil.java
> ./common/src/java/org/apache/hadoop/hive/common/type/UnsignedInt128.java
> ./common/src/test/org/apache/hadoop/hive/common/type/TestDecimal128.java
> ./common/src/test/org/apache/hadoop/hive/common/type/TestSignedInt128.java
> ./common/src/test/org/apache/hadoop/hive/common/type/TestSqlMathUtil.java
> ./common/src/test/org/apache/hadoop/hive/common/type/TestUnsignedInt128.java
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9188) BloomFilter in ORC row group index

2015-02-02 Thread Owen O'Malley (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14302507#comment-14302507
 ] 

Owen O'Malley commented on HIVE-9188:
-

Suggestions:
* Pick m to always be a multiple of 64 (since you are using longs are the 
representation)
* change the representation of BloomFilter in orc_proto to record the number of 
hash functions and not the size or fpp.
* use fixed64 for the bit field
* you'll also need to update the specification in the wiki with the change to 
the format 
(https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC#LanguageManualORC-orc-specORCFormatSpecification)
* revert the spurious change to CliDriver.java
* revert the spurious change to .gitignore
* it seems suboptimal to convert long values to bytes before hashing


> BloomFilter in ORC row group index
> --
>
> Key: HIVE-9188
> URL: https://issues.apache.org/jira/browse/HIVE-9188
> Project: Hive
>  Issue Type: New Feature
>  Components: File Formats
>Affects Versions: 0.15.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>  Labels: orcfile
> Attachments: HIVE-9188.1.patch, HIVE-9188.2.patch, HIVE-9188.3.patch, 
> HIVE-9188.4.patch, HIVE-9188.5.patch, HIVE-9188.6.patch
>
>
> BloomFilters are well known probabilistic data structure for set membership 
> checking. We can use bloom filters in ORC index for better row group pruning. 
> Currently, ORC row group index uses min/max statistics to eliminate row 
> groups (stripes as well) that do not satisfy predicate condition specified in 
> the query. But in some cases, the efficiency of min/max based elimination is 
> not optimal (unsorted columns with wide range of entries). Bloom filters can 
> be an effective and efficient alternative for row group/split elimination for 
> point queries or queries with IN clause.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9593) ORC Reader should ignore unknown metadata streams

2015-02-05 Thread Owen O'Malley (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-9593:

Status: Patch Available  (was: Open)

> ORC Reader should ignore unknown metadata streams 
> --
>
> Key: HIVE-9593
> URL: https://issues.apache.org/jira/browse/HIVE-9593
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Affects Versions: 0.13.1, 0.12.0, 0.11.0, 1.0.0, 1.2.0, 1.1.0
>Reporter: Gopal V
>Assignee: Owen O'Malley
> Attachments: hive-9593.patch
>
>
> ORC readers should ignore metadata streams which are non-essential additions 
> to the main data streams.
> This will include additional indices, histograms or anything we add as an 
> optional stream.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9593) ORC Reader should ignore unknown metadata streams

2015-02-05 Thread Owen O'Malley (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-9593:

Attachment: hive-9593.patch

This patch changes all of the required fields to be optional. I've gone through 
the current code to ensure that null pointers from getKind() won't cause NPE.

> ORC Reader should ignore unknown metadata streams 
> --
>
> Key: HIVE-9593
> URL: https://issues.apache.org/jira/browse/HIVE-9593
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Affects Versions: 0.11.0, 0.12.0, 0.13.1, 1.0.0, 1.2.0, 1.1.0
>Reporter: Gopal V
>Assignee: Owen O'Malley
> Attachments: hive-9593.patch
>
>
> ORC readers should ignore metadata streams which are non-essential additions 
> to the main data streams.
> This will include additional indices, histograms or anything we add as an 
> optional stream.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9593) ORC Reader should ignore unknown metadata streams

2015-02-11 Thread Owen O'Malley (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-9593:

   Resolution: Fixed
Fix Version/s: 1.1.0
   1.0.1
   Status: Resolved  (was: Patch Available)

I committed this. Thanks for the review, Gopal!

> ORC Reader should ignore unknown metadata streams 
> --
>
> Key: HIVE-9593
> URL: https://issues.apache.org/jira/browse/HIVE-9593
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Affects Versions: 0.11.0, 0.12.0, 0.13.1, 1.0.0, 1.2.0, 1.1.0
>Reporter: Gopal V
>Assignee: Owen O'Malley
> Fix For: 1.0.1, 1.1.0
>
> Attachments: HIVE-9593.no-autogen.patch, hive-9593.patch
>
>
> ORC readers should ignore metadata streams which are non-essential additions 
> to the main data streams.
> This will include additional indices, histograms or anything we add as an 
> optional stream.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-15375) Port ORC-115 to storage-api

2016-12-06 Thread Owen O'Malley (JIRA)

Owen O'Malley created HIVE-15375:


 Summary: Port ORC-115 to storage-api
 Key: HIVE-15375
 URL: https://issues.apache.org/jira/browse/HIVE-15375
 Project: Hive
  Issue Type: Improvement
Reporter: Owen O'Malley
Assignee: Owen O'Malley


Currently, VectorizedRowBatch.toString() assumes that all BytesColumnVector's 
use the internal buffer for all of the values. This leads to incorrect strings 
in many common cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-15419) Separate out storage-api to be released independently

2016-12-12 Thread Owen O'Malley (JIRA)

Owen O'Malley created HIVE-15419:


 Summary: Separate out storage-api to be released independently
 Key: HIVE-15419
 URL: https://issues.apache.org/jira/browse/HIVE-15419
 Project: Hive
  Issue Type: Task
  Components: storage-api
Reporter: Owen O'Malley


Currently, the Hive project releases a single monolithic release, but this 
makes file formats reading directly into Hive's vector row batches a circular 
dependence. Storage-api is a small module with the vectorized row batches and 
SearchArgument that are necessary for efficient vectorized read and write. By 
releasing storage-api independently, we can make an interface that the file 
formats can read and write from.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-15643) remove use of default charset in FastHiveDecimal

2017-01-16 Thread Owen O'Malley (JIRA)

Owen O'Malley created HIVE-15643:


 Summary: remove use of default charset in FastHiveDecimal
 Key: HIVE-15643
 URL: https://issues.apache.org/jira/browse/HIVE-15643
 Project: Hive
  Issue Type: Bug
Reporter: Owen O'Malley


HIVE-15335 introduced some new uses of String.getBytes(), which uses the 
default char set. These need to be replaced with the version that always uses 
UTF8.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-15841) Upgrade Hive to ORC 1.3.2

2017-02-07 Thread Owen O'Malley (JIRA)

Owen O'Malley created HIVE-15841:


 Summary: Upgrade Hive to ORC 1.3.2
 Key: HIVE-15841
 URL: https://issues.apache.org/jira/browse/HIVE-15841
 Project: Hive
  Issue Type: Bug
Reporter: Owen O'Malley


Hive needs ORC-141 and ORC-135, so we should upgrade to ORC-1.3.2 once it 
releases.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (HIVE-15922) SchemaEvolution must guarantee that getFileIncluded is not null

2017-02-14 Thread Owen O'Malley (JIRA)

Owen O'Malley created HIVE-15922:


 Summary: SchemaEvolution must guarantee that getFileIncluded is 
not null
 Key: HIVE-15922
 URL: https://issues.apache.org/jira/browse/HIVE-15922
 Project: Hive
  Issue Type: Bug
  Components: ORC
Affects Versions: 2.1.1
Reporter: Owen O'Malley
 Fix For: 2.1.2


This only impacts branch-2.1, because it is already fixed in master by 
HIVE-14007.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (HIVE-15929) Fix HiveDecimalWritable

2017-02-15 Thread Owen O'Malley (JIRA)

Owen O'Malley created HIVE-15929:


 Summary: Fix HiveDecimalWritable 
 Key: HIVE-15929
 URL: https://issues.apache.org/jira/browse/HIVE-15929
 Project: Hive
  Issue Type: Bug
Reporter: Owen O'Malley


HIVE-15335 broke compatibility with Hive 2.1 by making 
HiveDecimalWritable.getInternalStorate() throw an exception when called on an 
unset value. It is easy to instead return an empty array, which will allow the 
old code to allocate a new array.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (HIVE-16549) Fix an incompatible change in PredicateLeafImpl from HIVE-15269

2017-04-26 Thread Owen O'Malley (JIRA)

Owen O'Malley created HIVE-16549:


 Summary: Fix an incompatible change in PredicateLeafImpl from 
HIVE-15269
 Key: HIVE-16549
 URL: https://issues.apache.org/jira/browse/HIVE-16549
 Project: Hive
  Issue Type: Bug
Reporter: Owen O'Malley
Assignee: Owen O'Malley


HIVE-15269 added a parameter to the constructor for PredicateLeafImpl for a 
configuration object. The configuration object is only used for the new 
LiteralDelegates.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (HIVE-16683) ORC WriterVersion gets ArrayIndexOutOfBoundsException on newer ORC files

2017-05-16 Thread Owen O'Malley (JIRA)

Owen O'Malley created HIVE-16683:


 Summary: ORC WriterVersion gets ArrayIndexOutOfBoundsException on 
newer ORC files
 Key: HIVE-16683
 URL: https://issues.apache.org/jira/browse/HIVE-16683
 Project: Hive
  Issue Type: Bug
  Components: ORC
Affects Versions: 2.1.1, 2.2.0
Reporter: Owen O'Malley
Assignee: Owen O'Malley


This only impacts branch-2.1 and branch-2.2, because it has been fixed in the 
ORC project's code base via ORC-125.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (HIVE-16787) Fix itests in branch-2.2

2017-05-30 Thread Owen O'Malley (JIRA)

Owen O'Malley created HIVE-16787:


 Summary: Fix itests in branch-2.2
 Key: HIVE-16787
 URL: https://issues.apache.org/jira/browse/HIVE-16787
 Project: Hive
  Issue Type: Bug
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 2.2.0


The itests are broken in branch 2.2 and need to be fixed before release.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (HIVE-17118) Clean up of HIVE-14309 to move the orc source code to org.apache.hive.orc

2017-07-18 Thread Owen O'Malley (JIRA)

Owen O'Malley created HIVE-17118:


 Summary: Clean up of HIVE-14309 to move the orc source code to 
org.apache.hive.orc
 Key: HIVE-17118
 URL: https://issues.apache.org/jira/browse/HIVE-17118
 Project: Hive
  Issue Type: Bug
  Components: ORC
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 2.2.0


Just for branch-2.2.

HIVE-14309 shaded the hive-orc jar to use a unique package org.apache.hive.orc 
package. This patch moves the source files over to the right directory and 
removes the shading.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (HIVE-17154) fix rat problems in branch-2.2

2017-07-21 Thread Owen O'Malley (JIRA)

Owen O'Malley created HIVE-17154:


 Summary: fix rat problems in branch-2.2
 Key: HIVE-17154
 URL: https://issues.apache.org/jira/browse/HIVE-17154
 Project: Hive
  Issue Type: Bug
Reporter: Owen O'Malley
Assignee: Owen O'Malley


Fix rat problems in the branch-2.2.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (HIVE-17171) Remove old javadoc versions

2017-07-25 Thread Owen O'Malley (JIRA)

Owen O'Malley created HIVE-17171:


 Summary: Remove old javadoc versions
 Key: HIVE-17171
 URL: https://issues.apache.org/jira/browse/HIVE-17171
 Project: Hive
  Issue Type: Improvement
Reporter: Owen O'Malley


We currently have a lot of old javadoc versions. I'd propose that we keep the 
following versions:

* r1.2.2
* r2.1.1
* r2.2.0

(Note that 2.3.0 was not checked in to the site.) In particular, I'd suggest we 
remove:

* hcat-r0.5.0
* r0.10.0
* r0.11.0
* r0.12.0
* r0.13.1
* r1.0.1
* r1.1.1
* r2.0.1

Any concerns?




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (HIVE-17173) Add some connivence redirects to the Hive site

2017-07-25 Thread Owen O'Malley (JIRA)

Owen O'Malley created HIVE-17173:


 Summary: Add some connivence redirects to the Hive site
 Key: HIVE-17173
 URL: https://issues.apache.org/jira/browse/HIVE-17173
 Project: Hive
  Issue Type: Improvement
Reporter: Owen O'Malley
Assignee: Owen O'Malley


I'd propose that we add the following redirects to our site's .htaccess:

* http://hive.apache.org/bugs -> https://issues.apache.org/jira/browse/hive
* http://hive.apache.org/downloads -> 
https://www.apache.org/dyn/closer.cgi/hive/
* http://hive.apache.org/releases -> https://hive.apache.org/docs/downloads.html
* http://hive.apache.org/src -> https://github.com/apache/hive
* http://hive.apache.org/web-src -> 
https://svn.apache.org/repos/asf/hive/cms/trunk

Thoughts?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (HIVE-17924) Restore SerDe by reverting HIVE-15167 to unbreak API compatibility

2017-10-27 Thread Owen O'Malley (JIRA)

Owen O'Malley created HIVE-17924:


 Summary: Restore SerDe by reverting HIVE-15167 to unbreak API 
compatibility
 Key: HIVE-17924
 URL: https://issues.apache.org/jira/browse/HIVE-17924
 Project: Hive
  Issue Type: Bug
Affects Versions: 2.3.0, 2.3.1
Reporter: Owen O'Malley
Assignee: Owen O'Malley


HIVE-15167 broke compatibility badly for very little gain and caused a lot of 
pain for our users. We should revert it and restore the SerDe interface.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (HIVE-17925) Fix TestHooks so that it avoids ClassNotFound on teardown

2017-10-27 Thread Owen O'Malley (JIRA)

Owen O'Malley created HIVE-17925:


 Summary: Fix TestHooks so that it avoids ClassNotFound on teardown
 Key: HIVE-17925
 URL: https://issues.apache.org/jira/browse/HIVE-17925
 Project: Hive
  Issue Type: Bug
Reporter: Owen O'Malley
Assignee: Owen O'Malley


TestHooks gets a ClassNotFound exception during teardown, which messes up some 
following tests.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (HIVE-19013) Fix some minor build issues in storage-api

2018-03-21 Thread Owen O'Malley (JIRA)

Owen O'Malley created HIVE-19013:


 Summary: Fix some minor build issues in storage-api
 Key: HIVE-19013
 URL: https://issues.apache.org/jira/browse/HIVE-19013
 Project: Hive
  Issue Type: Bug
  Components: storage-api
Reporter: Owen O'Malley
Assignee: Owen O'Malley


Currently, the storage-api tests complain that there isn't a log4j2.xml and the 
javadoc fails.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-20135) Fix incompatible change in TimestampColumnVector to default to UTC

2018-07-10 Thread Owen O'Malley (JIRA)

Owen O'Malley created HIVE-20135:


 Summary: Fix incompatible change in TimestampColumnVector to 
default to UTC
 Key: HIVE-20135
 URL: https://issues.apache.org/jira/browse/HIVE-20135
 Project: Hive
  Issue Type: Improvement
Reporter: Owen O'Malley
Assignee: Jesus Camacho Rodriguez


HIVE-20007 changed the default for TimestampColumnVector to be to use UTC, 
which breaks the API compatibility with storage-api 2.6.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-12638) Hive should not create empty files in partitions

2015-12-09 Thread Owen O'Malley (JIRA)

Owen O'Malley created HIVE-12638:


 Summary: Hive should not create empty files in partitions
 Key: HIVE-12638
 URL: https://issues.apache.org/jira/browse/HIVE-12638
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Owen O'Malley


Currently Hive creates empty files for buckets with no rows in a directory. I 
believe this was originally because the SMB and bucket join require files to be 
present to get InputSplits. There are customers where this behavior leads the 
creation of more 200,000 empty ORC files per an hour on a cluster (with peaks 
of more than 725,000 per an hour). We've also seen instances where a single 
DataNode is involved in 5600 of these empty ORC files within a 2 minute period. 
This causes significant stress on HDFS at both the NameNode and DataNode and is 
completely unnecessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-12838) Add methods for getting and storing serialized ORC file tails

2016-01-11 Thread Owen O'Malley (JIRA)

Owen O'Malley created HIVE-12838:


 Summary: Add methods for getting and storing serialized ORC file 
tails
 Key: HIVE-12838
 URL: https://issues.apache.org/jira/browse/HIVE-12838
 Project: Hive
  Issue Type: Sub-task
Reporter: Owen O'Malley
Assignee: Owen O'Malley


Provide a pair of routines for getting and restoring from a serialized file 
footer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-13232) Aggressively drop compression buffers in ORC OutStreams

2016-03-08 Thread Owen O'Malley (JIRA)

Owen O'Malley created HIVE-13232:


 Summary: Aggressively drop compression buffers in ORC OutStreams
 Key: HIVE-13232
 URL: https://issues.apache.org/jira/browse/HIVE-13232
 Project: Hive
  Issue Type: Bug
  Components: ORC
Reporter: Owen O'Malley
Assignee: Owen O'Malley


In Hive 0.11, when ORC's OutStream's were flushed they dropped all of the their 
buffers. In the patch for HIVE-4342, we inadvertently changed that behavior so 
that one of the buffers is held on to. For queries with a lot of writers and 
thus under significant memory pressure this can have a significant impact on 
the memory usage. 

Note that "hive.optimize.sort.dynamic.partition" avoids this problem by sorting 
on the dynamic partition key and thus only a single ORC writer is open at once. 
This will use memory more effectively and avoid creating ORC files with very 
small stripes, which will produce better downstream performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-13464) Backport changes to storage-api into branch 2 for release into 2.0.1

2016-04-08 Thread Owen O'Malley (JIRA)

Owen O'Malley created HIVE-13464:


 Summary: Backport changes to storage-api into branch 2 for release 
into 2.0.1
 Key: HIVE-13464
 URL: https://issues.apache.org/jira/browse/HIVE-13464
 Project: Hive
  Issue Type: Bug
  Components: storage-api
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 2.0.1


To release ORC as a separate project, backporting the safe changes for 
storage-api to 2.0.1 will minimize the disruption.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-13763) Update smart-apply-patch.sh with ability to use patches from git

2016-05-14 Thread Owen O'Malley (JIRA)

Owen O'Malley created HIVE-13763:


 Summary: Update smart-apply-patch.sh with ability to use patches 
from git
 Key: HIVE-13763
 URL: https://issues.apache.org/jira/browse/HIVE-13763
 Project: Hive
  Issue Type: Improvement
Reporter: Owen O'Malley
Assignee: Owen O'Malley


Currently, the smart-apply-patch.sh doesn't understand git patches.  It is 
relatively easy to make it understand patches generated by:

{code}
% git format-patch apache/master --stdout > HIVE-999.patch
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-13906) Remove guava dependence from storage-api module

2016-06-01 Thread Owen O'Malley (JIRA)

Owen O'Malley created HIVE-13906:


 Summary: Remove guava dependence from storage-api module
 Key: HIVE-13906
 URL: https://issues.apache.org/jira/browse/HIVE-13906
 Project: Hive
  Issue Type: Bug
  Components: storage-api
Reporter: Owen O'Malley
Assignee: Owen O'Malley


Guava is a very problematic library to depend on because of the version 
incompatibilities and the use of it in the storage-api module causes it to leak 
into everything that depends on it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-14007) Replace ORC module with ORC release

2016-06-13 Thread Owen O'Malley (JIRA)

Owen O'Malley created HIVE-14007:


 Summary: Replace ORC module with ORC release
 Key: HIVE-14007
 URL: https://issues.apache.org/jira/browse/HIVE-14007
 Project: Hive
  Issue Type: Bug
  Components: ORC
Affects Versions: 2.2.0
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 2.2.0


This completes moving the core ORC reader & writer to the ORC project.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-14166) Minor updates to the website.

2016-07-05 Thread Owen O'Malley (JIRA)

Owen O'Malley created HIVE-14166:


 Summary: Minor updates to the website.
 Key: HIVE-14166
 URL: https://issues.apache.org/jira/browse/HIVE-14166
 Project: Hive
  Issue Type: Bug
Reporter: Owen O'Malley
Assignee: Owen O'Malley


Minor updates to the website & documentation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-14220) Protected users from Reader.rows(Options) modifying the Options object

2016-07-12 Thread Owen O'Malley (JIRA)

Owen O'Malley created HIVE-14220:


 Summary: Protected users from Reader.rows(Options) modifying the 
Options object
 Key: HIVE-14220
 URL: https://issues.apache.org/jira/browse/HIVE-14220
 Project: Hive
  Issue Type: Bug
Reporter: Owen O'Malley
Assignee: Owen O'Malley


This is a matching fix to HIVE-14004 where ACID was getting in to trouble 
because it was reusing the Reader.Options argument between files and 
Reader.rows was modifying it. HIVE-14004 just fixed the Hive case, but we need 
a corresponding fix over here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-14242) Backport ORC-53 to Hive

2016-07-14 Thread Owen O'Malley (JIRA)

Owen O'Malley created HIVE-14242:


 Summary: Backport ORC-53 to Hive
 Key: HIVE-14242
 URL: https://issues.apache.org/jira/browse/HIVE-14242
 Project: Hive
  Issue Type: Bug
  Components: ORC
Reporter: Owen O'Malley
Assignee: Owen O'Malley


ORC-53 was mostly about the mapreduce shims for ORC, but it fixed a problem in 
TypeDescription that should be backported to Hive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-14309) Fix naming of classes in orc module to not conflict with standalone orc

2016-07-21 Thread Owen O'Malley (JIRA)

Owen O'Malley created HIVE-14309:


 Summary: Fix naming of classes in orc module to not conflict with 
standalone orc
 Key: HIVE-14309
 URL: https://issues.apache.org/jira/browse/HIVE-14309
 Project: Hive
  Issue Type: Bug
Reporter: Owen O'Malley
Assignee: Owen O'Malley


The current Hive 2.0 and 2.1 releases have classes in the org.apache.orc 
namespace that clash with the ORC project's classes. From Hive 2.2 onward, the 
classes will only be on ORC, but we'll reduce the problems of classpath issues 
if we rename the classes to org.apache.hive.orc.

I've looked at a set of projects (pig, spark, oozie, flume, & storm) and can't 
find any uses of Hive's versions of the org.apache.orc classes, so I believe 
this is a safe change that will reduce the integration problems down stream.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-15124) Fix OrcInputFormat to use reader's schema for include boolean array

2016-11-03 Thread Owen O'Malley (JIRA)

Owen O'Malley created HIVE-15124:


 Summary: Fix OrcInputFormat to use reader's schema for include 
boolean array
 Key: HIVE-15124
 URL: https://issues.apache.org/jira/browse/HIVE-15124
 Project: Hive
  Issue Type: Bug
  Components: ORC
Affects Versions: 2.1.0
Reporter: Owen O'Malley
Assignee: Owen O'Malley


Currently, the OrcInputFormat uses the file's schema rather than the reader's 
schema. This means that SchemaEvolution fails with an 
ArrayIndexOutOfBoundsException if a partition has a different schema than the 
table.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-10171) Create a storage-api module

2015-03-31 Thread Owen O'Malley (JIRA)

Owen O'Malley created HIVE-10171:


 Summary: Create a storage-api module
 Key: HIVE-10171
 URL: https://issues.apache.org/jira/browse/HIVE-10171
 Project: Hive
  Issue Type: Bug
Reporter: Owen O'Malley
Assignee: Owen O'Malley


To support high performance file formats, I'd like to propose that we move the 
minimal set of classes that are required to integrate with Hive in to a new 
module named "storage-api". This module will include VectorizedRowBatch, the 
various ColumnVector classes, and the SARG classes. It will form the start of 
an API that high performance storage formats can use to integrate with Hive. 
Both ORC and Parquet can use the new API to support vectorization and SARGs 
without performance destroying shims.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-10305) TestOrcFile has a mistake that makes metadata test ineffective

2015-04-10 Thread Owen O'Malley (JIRA)

Owen O'Malley created HIVE-10305:


 Summary: TestOrcFile has a mistake that makes metadata test 
ineffective
 Key: HIVE-10305
 URL: https://issues.apache.org/jira/browse/HIVE-10305
 Project: Hive
  Issue Type: Bug
Reporter: Owen O'Malley
Assignee: Owen O'Malley


Two of the values that are being stored as user metadata in 
TestOrcFile.metaData weren't flipped and thus were empty buffers. The test 
passes because they are compared to empty buffers. We should fix the test to 
perform the expected test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-10407) separate out the timestamp ranges for testing purposes

2015-04-20 Thread Owen O'Malley (JIRA)

Owen O'Malley created HIVE-10407:


 Summary: separate out the timestamp ranges for testing purposes
 Key: HIVE-10407
 URL: https://issues.apache.org/jira/browse/HIVE-10407
 Project: Hive
  Issue Type: Bug
Reporter: Owen O'Malley
Assignee: Owen O'Malley


Some platforms have limits for date ranges, so separate out the test cases that 
are outside of the range 1970 to 2038.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-10794) Remove the dependence from ErrorMsg to HiveUtils

2015-05-21 Thread Owen O'Malley (JIRA)

Owen O'Malley created HIVE-10794:


 Summary: Remove the dependence from ErrorMsg to HiveUtils
 Key: HIVE-10794
 URL: https://issues.apache.org/jira/browse/HIVE-10794
 Project: Hive
  Issue Type: Sub-task
Reporter: Owen O'Malley


HiveUtils has a large set of dependencies and ErrorMsg only needs the new line 
constant. Breaking the dependence will reduce the dependency set from ErrorMsg 
significantly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-10795) Remove use of PerfLogger from Orc

2015-05-21 Thread Owen O'Malley (JIRA)

Owen O'Malley created HIVE-10795:


 Summary: Remove use of PerfLogger from Orc
 Key: HIVE-10795
 URL: https://issues.apache.org/jira/browse/HIVE-10795
 Project: Hive
  Issue Type: Sub-task
Reporter: Owen O'Malley
Assignee: Owen O'Malley


PerfLogger is yet another class with a huge dependency set that Orc doesn't 
need.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-10796) Remove dependencies on NumericHistogram and NumDistinctValueEstimator from JavaDataModel

2015-05-21 Thread Owen O'Malley (JIRA)

Owen O'Malley created HIVE-10796:


 Summary: Remove dependencies on NumericHistogram and 
NumDistinctValueEstimator from JavaDataModel
 Key: HIVE-10796
 URL: https://issues.apache.org/jira/browse/HIVE-10796
 Project: Hive
  Issue Type: Sub-task
Reporter: Owen O'Malley
Assignee: Owen O'Malley


The JavaDataModel class is used in a lot of places and the non-general 
calculations are better done in the other classes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-10797) Simplify the test for vectorized input

2015-05-21 Thread Owen O'Malley (JIRA)

Owen O'Malley created HIVE-10797:


 Summary: Simplify the test for vectorized input
 Key: HIVE-10797
 URL: https://issues.apache.org/jira/browse/HIVE-10797
 Project: Hive
  Issue Type: Sub-task
Reporter: Owen O'Malley
Assignee: Owen O'Malley


The call to Utilities.isVectorMode should be simplified for the readers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-10798) Remove dependence on VectorizedBatchUtil from VectorizedOrcAcidRowReader

2015-05-21 Thread Owen O'Malley (JIRA)

Owen O'Malley created HIVE-10798:


 Summary: Remove dependence on VectorizedBatchUtil from 
VectorizedOrcAcidRowReader
 Key: HIVE-10798
 URL: https://issues.apache.org/jira/browse/HIVE-10798
 Project: Hive
  Issue Type: Sub-task
Reporter: Owen O'Malley
Assignee: Owen O'Malley


VectorizedBatchUtil has a lot of dependences that Orc should avoid and the code 
should be refactored.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-10799) Refactor the SearchArgumentFactory to remove the dependence on ExprNodeGenericFuncDesc

2015-05-21 Thread Owen O'Malley (JIRA)

Owen O'Malley created HIVE-10799:


 Summary: Refactor the SearchArgumentFactory to remove the 
dependence on ExprNodeGenericFuncDesc
 Key: HIVE-10799
 URL: https://issues.apache.org/jira/browse/HIVE-10799
 Project: Hive
  Issue Type: Sub-task
Reporter: Owen O'Malley
Assignee: Owen O'Malley


SearchArgumentFactory and SearchArgumentImpl are high level and shouldn't 
depend on the internals of Hive's AST model.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-11080) Modify VectorizedRowBatch.toString() to not depend on VectorExpressionWriter

2015-06-22 Thread Owen O'Malley (JIRA)

Owen O'Malley created HIVE-11080:


 Summary: Modify VectorizedRowBatch.toString() to not depend on 
VectorExpressionWriter
 Key: HIVE-11080
 URL: https://issues.apache.org/jira/browse/HIVE-11080
 Project: Hive
  Issue Type: Sub-task
Reporter: Owen O'Malley
Assignee: Owen O'Malley


Currently the VectorizedRowBatch.toString method uses the 
VectorExpressionWriter to convert the row batch to a string.

Since the string is only used for printing error messages, I'd propose making 
the toString use the types of the vector batch instead of the object inspector.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-11086) Remove use of ErrorMsg in Orc's RunLengthIntegerReaderV2

2015-06-23 Thread Owen O'Malley (JIRA)

Owen O'Malley created HIVE-11086:


 Summary: Remove use of ErrorMsg in Orc's RunLengthIntegerReaderV2
 Key: HIVE-11086
 URL: https://issues.apache.org/jira/browse/HIVE-11086
 Project: Hive
  Issue Type: Sub-task
Reporter: Owen O'Malley
Assignee: Owen O'Malley


ORC's rle v2 reader uses a string literal from ErrorMsg, which forces a large 
dependency on the rle v2 reader. Pulling the string literal in directly doesn't 
change the behavior and fixes the linkage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-11115) Remove dependence from ORC's WriterImpl to OrcInputFormat

2015-06-25 Thread Owen O'Malley (JIRA)

Owen O'Malley created HIVE-5:


 Summary: Remove dependence from ORC's WriterImpl to OrcInputFormat
 Key: HIVE-5
 URL: https://issues.apache.org/jira/browse/HIVE-5
 Project: Hive
  Issue Type: Sub-task
Reporter: Owen O'Malley
Assignee: Owen O'Malley


Currently there is a link from WriterImpl to OrcInputFormat that should be 
removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-11124) Move OrcRecordUpdater.getAcidEventFields to RecordReaderFactory

2015-06-25 Thread Owen O'Malley (JIRA)

Owen O'Malley created HIVE-11124:


 Summary: Move OrcRecordUpdater.getAcidEventFields to 
RecordReaderFactory
 Key: HIVE-11124
 URL: https://issues.apache.org/jira/browse/HIVE-11124
 Project: Hive
  Issue Type: Sub-task
Reporter: Owen O'Malley
Assignee: Owen O'Malley


Move OrcRecordUpdater.getAcidEventFields to RecordReaderFactory to avoid the 
extra dependence.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-11137) In DateWritable remove the use of LazyBinaryUtils

2015-06-28 Thread Owen O'Malley (JIRA)

Owen O'Malley created HIVE-11137:


 Summary: In DateWritable remove the use of LazyBinaryUtils
 Key: HIVE-11137
 URL: https://issues.apache.org/jira/browse/HIVE-11137
 Project: Hive
  Issue Type: Sub-task
Reporter: Owen O'Malley
Assignee: Owen O'Malley


Currently the DateWritable class uses LazyBinaryUtils, which has a lot of 
dependencies.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-11144) Replace row by row reader and writer with shims to vectorized path.

2015-06-29 Thread Owen O'Malley (JIRA)

Owen O'Malley created HIVE-11144:


 Summary: Replace row by row reader and writer with shims to 
vectorized path.
 Key: HIVE-11144
 URL: https://issues.apache.org/jira/browse/HIVE-11144
 Project: Hive
  Issue Type: Sub-task
Reporter: Owen O'Malley
Assignee: Owen O'Malley


The core ORC reader and writer will be better served if the vectorized read and 
write paths are the primary API and the row by row reader and writer and their 
corresponding object inspectors become Hive-specific shims.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-11209) Clean up dependencies in HiveDecimalWritable

2015-07-08 Thread Owen O'Malley (JIRA)

Owen O'Malley created HIVE-11209:


 Summary: Clean up dependencies in HiveDecimalWritable
 Key: HIVE-11209
 URL: https://issues.apache.org/jira/browse/HIVE-11209
 Project: Hive
  Issue Type: Sub-task
Reporter: Owen O'Malley
Assignee: Owen O'Malley


Currently HiveDecimalWritable depends on:
* org.apache.hadoop.hive.serde2.ByteStream
* org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils
* org.apache.hadoop.hive.serde2.typeinfo.HiveDecimalUtils

since we need HiveDecimalWritable for the decimal VectorizedColumnBatch, 
breaking these dependencies will improve things.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-11210) Remove dependency on HiveConf from Orc reader & writer

2015-07-08 Thread Owen O'Malley (JIRA)

Owen O'Malley created HIVE-11210:


 Summary: Remove dependency on HiveConf from Orc reader & writer
 Key: HIVE-11210
 URL: https://issues.apache.org/jira/browse/HIVE-11210
 Project: Hive
  Issue Type: Sub-task
Reporter: Owen O'Malley
Assignee: Owen O'Malley


Currently the ORC reader and writer get their default values from HiveConf. I 
propose that we make the reader and writer have their own programatic defaults 
and the OrcInputFormat and OrcOutputFormat can use the version in HiveConf.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-11212) Create vectorized types for complex types

2015-07-08 Thread Owen O'Malley (JIRA)

Owen O'Malley created HIVE-11212:


 Summary: Create vectorized types for complex types
 Key: HIVE-11212
 URL: https://issues.apache.org/jira/browse/HIVE-11212
 Project: Hive
  Issue Type: Sub-task
Reporter: Owen O'Malley
Assignee: Owen O'Malley


We need vectorized types for structs, maps, lists, and unions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-11245) Fix the LLAP to ORC APIs

2015-07-13 Thread Owen O'Malley (JIRA)

Owen O'Malley created HIVE-11245:


 Summary: Fix the LLAP to ORC APIs
 Key: HIVE-11245
 URL: https://issues.apache.org/jira/browse/HIVE-11245
 Project: Hive
  Issue Type: Bug
Reporter: Owen O'Malley
Priority: Blocker
 Fix For: llap


Currently the LLAP branch has refactored the ORC code to have different code 
paths depending on whether the data is coming from the cache or a FileSystem.

We need to introduce a concept of a DataSource that is responsible for getting 
the necessary bytes regardless of whether they are coming from a FileSystem, in 
memory cache, or both.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-11253) Move SearchArgument and VectorizedRowBatch classes to storage-api.

2015-07-14 Thread Owen O'Malley (JIRA)

Owen O'Malley created HIVE-11253:


 Summary: Move SearchArgument and VectorizedRowBatch classes to 
storage-api.
 Key: HIVE-11253
 URL: https://issues.apache.org/jira/browse/HIVE-11253
 Project: Hive
  Issue Type: Sub-task
Reporter: Owen O'Malley






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-11307) Remove getWritableObject from ColumnVectorBatch

2015-07-18 Thread Owen O'Malley (JIRA)

Owen O'Malley created HIVE-11307:


 Summary: Remove getWritableObject from ColumnVectorBatch
 Key: HIVE-11307
 URL: https://issues.apache.org/jira/browse/HIVE-11307
 Project: Hive
  Issue Type: Sub-task
  Components: Vectorization
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 2.0.0


ColumnVectorBatch.getWritableObject is only used in a few tests and is really 
problematic when adding the complex types to vectorization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-11321) Move OrcFile.OrcTableProperties from OrcFile into OrcConf.

2015-07-20 Thread Owen O'Malley (JIRA)

Owen O'Malley created HIVE-11321:


 Summary: Move OrcFile.OrcTableProperties from OrcFile into OrcConf.
 Key: HIVE-11321
 URL: https://issues.apache.org/jira/browse/HIVE-11321
 Project: Hive
  Issue Type: Sub-task
Reporter: Owen O'Malley
Assignee: Owen O'Malley


We should pull all of the configuration/table property knobs into a single list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-11370) Extend SARGs to support binary type

2015-07-24 Thread Owen O'Malley (JIRA)

Owen O'Malley created HIVE-11370:


 Summary: Extend SARGs to support binary type
 Key: HIVE-11370
 URL: https://issues.apache.org/jira/browse/HIVE-11370
 Project: Hive
  Issue Type: Bug
Reporter: Owen O'Malley


Currently the sargs only apply to string, boolean, integer, decimal, floating, 
date, and timestamp columns. It would be good to support binary blobs also.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-11417) Create ObjectInspectors for VectorizedRowBatch

2015-07-30 Thread Owen O'Malley (JIRA)

Owen O'Malley created HIVE-11417:


 Summary: Create ObjectInspectors for VectorizedRowBatch
 Key: HIVE-11417
 URL: https://issues.apache.org/jira/browse/HIVE-11417
 Project: Hive
  Issue Type: Sub-task
Reporter: Owen O'Malley
Assignee: Owen O'Malley


I'd like to make the default path for reading and writing ORC files to be 
vectorized. To ensure that Hive can still read row by row, I'll make 
ObjectInspectors that are backed by the VectorizedRowBatch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-11618) Correct the SARG api to reunify the PredicateLeaf.Type INTEGER and LONG

2015-08-21 Thread Owen O'Malley (JIRA)

Owen O'Malley created HIVE-11618:


 Summary: Correct the SARG api to reunify the PredicateLeaf.Type 
INTEGER and LONG
 Key: HIVE-11618
 URL: https://issues.apache.org/jira/browse/HIVE-11618
 Project: Hive
  Issue Type: Bug
  Components: Types
Reporter: Owen O'Malley


The Parquet binding leaked implementation details into the generic SARG api. 

Rather than make all users of the SARG api deal with each of the specific 
types, reunify the INTEGER and LONG types. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-11704) Create errata.txt file

2015-08-31 Thread Owen O'Malley (JIRA)

Owen O'Malley created HIVE-11704:


 Summary: Create errata.txt file
 Key: HIVE-11704
 URL: https://issues.apache.org/jira/browse/HIVE-11704
 Project: Hive
  Issue Type: Bug
  Components: Documentation
Reporter: Owen O'Malley
Assignee: Owen O'Malley


As discussed on the email list, we should have a file documenting known 
problems in the commit messages.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-11807) Set ORC buffer size in relation to set stripe size

2015-09-12 Thread Owen O'Malley (JIRA)

Owen O'Malley created HIVE-11807:


 Summary: Set ORC buffer size in relation to set stripe size
 Key: HIVE-11807
 URL: https://issues.apache.org/jira/browse/HIVE-11807
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Owen O'Malley


A customer produced ORC files with very small stripe sizes (10k rows/stripe) by 
setting a small 64MB stripe size and 256K buffer size for a 54 column table. At 
that size, each of the streams only get a buffer or two before the stripe size 
is reached. The current code uses the available memory instead of the stripe 
size and thus doesn't shrink the buffer size if the JVM has much more memory 
than the stripe size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-11808) In ORC removing the dynamic dispatch for StringTreeReader improves read by 10%

2015-09-13 Thread Owen O'Malley (JIRA)

Owen O'Malley created HIVE-11808:


 Summary: In ORC removing the dynamic dispatch for StringTreeReader 
improves read by 10%
 Key: HIVE-11808
 URL: https://issues.apache.org/jira/browse/HIVE-11808
 Project: Hive
  Issue Type: Bug
Reporter: Owen O'Malley
Assignee: Owen O'Malley


When we introduced the dictionary/direct encodings for ORC, we made subclasses 
of StringTreeReader named StringDirectTreeReader and StringDictionaryTreeReader 
and introduce an additional dynamic dispatch in the inner loop. For tables with 
a lot of string columns, removing that extra dispatch improves performance 10%.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-11890) Create ORC module

2015-09-18 Thread Owen O'Malley (JIRA)

Owen O'Malley created HIVE-11890:


 Summary: Create ORC module
 Key: HIVE-11890
 URL: https://issues.apache.org/jira/browse/HIVE-11890
 Project: Hive
  Issue Type: Bug
Reporter: Owen O'Malley


Start moving classes over to the ORC module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-12054) Create vectorized write method

2015-10-07 Thread Owen O'Malley (JIRA)

Owen O'Malley created HIVE-12054:


 Summary: Create vectorized write method
 Key: HIVE-12054
 URL: https://issues.apache.org/jira/browse/HIVE-12054
 Project: Hive
  Issue Type: Sub-task
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Owen O'Malley


We need to add writer methods that can write VectorizedRowBatch to an ORC file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-12055) Create row-by-row shims for the write path

2015-10-07 Thread Owen O'Malley (JIRA)

Owen O'Malley created HIVE-12055:


 Summary: Create row-by-row shims for the write path 
 Key: HIVE-12055
 URL: https://issues.apache.org/jira/browse/HIVE-12055
 Project: Hive
  Issue Type: Sub-task
Reporter: Owen O'Malley
Assignee: Owen O'Malley


As part of removing the row-by-row writer, we'll need to shim out the higher 
level API (OrcSerde and OrcOutputFormat) so that we maintain backwards 
compatibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-12066) Add javadoc for methods added to public APIs

2015-10-07 Thread Owen O'Malley (JIRA)

Owen O'Malley created HIVE-12066:


 Summary: Add javadoc for methods added to public APIs
 Key: HIVE-12066
 URL: https://issues.apache.org/jira/browse/HIVE-12066
 Project: Hive
  Issue Type: Sub-task
Reporter: Owen O'Malley
Assignee: Sergey Shelukhin


Looking through the changes for ORC, there are methods being added without 
documentation:

{code}
--- ql/src/java/org/apache/hadoop/hive/ql/io/orc/Reader.java
+++ ql/src/java/org/apache/hadoop/hive/ql/io/orc/Reader.java
@@ -360,8 +353,18 @@ RecordReader rows(long offset, long length,

   MetadataReader metadata() throws IOException;

+  List getVersionList();
+
+  int getMetadataSize();
+
+  List getOrcProtoStripeStatistics();
+
+  List getStripeStatistics();
+
+  List getOrcProtoFileStatistics();
+
+  DataReader createDefaultDataReader(boolean useZeroCopy);
+
{code}

You really need to look through all of the interfaces and fix them before 
merging into master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-12159) Create vectorized readers for the complex types

2015-10-13 Thread Owen O'Malley (JIRA)

Owen O'Malley created HIVE-12159:


 Summary: Create vectorized readers for the complex types
 Key: HIVE-12159
 URL: https://issues.apache.org/jira/browse/HIVE-12159
 Project: Hive
  Issue Type: Sub-task
Reporter: Owen O'Malley
Assignee: Owen O'Malley


We need vectorized readers for the complex types.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-12286) Add option to ORC vectorized reader to not trim spaces from char columns.

2015-10-28 Thread Owen O'Malley (JIRA)

Owen O'Malley created HIVE-12286:


 Summary: Add option to ORC vectorized reader to not trim spaces 
from char columns.
 Key: HIVE-12286
 URL: https://issues.apache.org/jira/browse/HIVE-12286
 Project: Hive
  Issue Type: Sub-task
Reporter: Owen O'Malley


Currently the ORC reader in nextBatch always strips spaces from char columns. 
It is more natural for non-Hive applications to make it not trim the results on 
read, so I propose adding a switch to ReaderOptions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

< 2 3 4 5 6 7 8 >

601 - 700 of 706 matches

Mail list logo