Not able to get selective cols in orc using mapreduce
Hi, I am reading ORC format using map reduce program. I am using ORCNewInputFormat as an input format. I want to select few fields out of 50+ column table while reading. Though i am setting hive.io.file.readcolumn.ids and hive.io.file.read.all.columns, it is still selecting all columns in mapper. What I am missing. Thanks, Akansha jain -- -Akansha
RE: Hive Bucketing
Thanks for detailed explanation. Even without bucket pruning, expectation from bucketing is performance improvement. I am joining two tables which are bucketed on same no of buckets and column and comparing the performance with join of two unbucketed tables on bucket led column. I am using mapr dist and map join conversion is enabled by default. Performance is same in both the cases. I increased data size from 120 gb to 400 gb to see clear performance difference. But still results are same. Have set hive.optimize.bucketmapjoin=true. Any clue why that would be happening. Thanks, AJ On Jan 22, 2016 4:31 PM, "Mich Talebzadeh" wrote: > Hi, > > > > In general my understanding is that it will be possible to use bucket > pruning much like partition pruning (elimination) soon > > > > Bucketing in Hive refers to hash partitioning where a hashing function is > applied. Likewise an RDBMS like Oracle, Hive will apply a linear hashing > algorithm to prevent data from clustering within specific partitions. > Hashing is very effective if the column selected for bucketing has very > high selectivity like an ID column where selectivity (select > count(distinct(column))/count(column) ) = 1. In this case, the created > partitions/ files will be as evenly sized as possible. In a nutshell > bucketing is a method to get data evenly distributed over many > partitions/files. One should define the number of buckets by a power of > two -- 2^n, like 2, 4, 8, 16 etc to achieve best results. Again bucketing > will help concurrency in Hive. It may even allow a partition wise join i.e. > a join between two tables that are bucketed on the same column with the > same number of buckets (anyone has tried this?) > > > > One more things. When one defines the number of buckets at table creation > level in Hive, the number of partitions/files will be fixed. In contrast, > with partitioning you do not have this limitation. > > . > > Have you considered creating these tables as ORC tables? > > > > HTH > > > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > *Sybase ASE 15 Gold Medal Award 2008* > > A Winning Strategy: Running the most Critical Financial Data on ASE 15 > > > http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf > > Author of the books* "A Practitioner’s Guide to Upgrading to Sybase ASE > 15", ISBN 978-0-9563693-0-7*. > > co-author *"Sybase Transact SQL Guidelines Best Practices", ISBN > 978-0-9759693-0-4* > > *Publications due shortly:* > > *Complex Event Processing in Heterogeneous Environments*, ISBN: > 978-0-9563693-3-8 > > *Oracle and Sybase, Concepts and Contrasts*, ISBN: 978-0-9563693-1-4, volume > one out shortly > > > > http://talebzadehmich.wordpress.com > > > > NOTE: The information in this email is proprietary and confidential. This > message is for the designated recipient only, if you are not the intended > recipient, you should destroy it immediately. Any information in this > message shall not be understood as given or endorsed by Peridale Technology > Ltd, its subsidiaries or their employees, unless expressly so stated. It is > the responsibility of the recipient to ensure that this email is virus > free, therefore neither Peridale Technology Ltd, its subsidiaries nor their > employees accept any responsibility. > > > > *From:* Akansha Jain [mailto:akansha.15au...@gmail.com] > *Sent:* 22 January 2016 23:20 > *To:* user@hive.apache.org > *Subject:* RE: Hive Bucketing > > > > Thanks for response. I am using 0.13 mapr version. Could you tell more > about bucket pruning. > > On Jan 22, 2016 3:09 PM, "Mich Talebzadeh" wrote: > > Ok we are talking about bucket pruning here > > > > What version of Hive are using? > > > > Bucket pruning I believe is available from version 2.0 > > > > HTH > > > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > *Sybase ASE 15 Gold Medal Award 2008* > > A Winning Strategy: Running the most Critical Financial Data on ASE 15 > > > http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf > > Author of the books* "A Practitioner’s Guide to Upgrading to Sybase ASE > 15", ISBN 978-0-9563693-0-7*. > > co-author *"Sybase Transact SQL Guidelines
RE: Hive Bucketing
Thanks for response. I am using 0.13 mapr version. Could you tell more about bucket pruning. On Jan 22, 2016 3:09 PM, "Mich Talebzadeh" wrote: > Ok we are talking about bucket pruning here > > > > What version of Hive are using? > > > > Bucket pruning I believe is available from version 2.0 > > > > HTH > > > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > *Sybase ASE 15 Gold Medal Award 2008* > > A Winning Strategy: Running the most Critical Financial Data on ASE 15 > > > http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf > > Author of the books* "A Practitioner’s Guide to Upgrading to Sybase ASE > 15", ISBN 978-0-9563693-0-7*. > > co-author *"Sybase Transact SQL Guidelines Best Practices", ISBN > 978-0-9759693-0-4* > > *Publications due shortly:* > > *Complex Event Processing in Heterogeneous Environments*, ISBN: > 978-0-9563693-3-8 > > *Oracle and Sybase, Concepts and Contrasts*, ISBN: 978-0-9563693-1-4, volume > one out shortly > > > > http://talebzadehmich.wordpress.com > > > > NOTE: The information in this email is proprietary and confidential. This > message is for the designated recipient only, if you are not the intended > recipient, you should destroy it immediately. Any information in this > message shall not be understood as given or endorsed by Peridale Technology > Ltd, its subsidiaries or their employees, unless expressly so stated. It is > the responsibility of the recipient to ensure that this email is virus > free, therefore neither Peridale Technology Ltd, its subsidiaries nor their > employees accept any responsibility. > > > > *From:* Akansha Jain [mailto:akansha.15au...@gmail.com] > *Sent:* 22 January 2016 21:55 > *To:* user@hive.apache.org > *Subject:* Hive Bucketing > > > > Hi All, > I have enabled bucketing in table. I created 256 buckets on user id. Now > when I am querying (select count(*) from table where userid =172839393) > that table, map reduce should only use single partitioned file as input to > mappers. But its considering all files as input to mapper and I don't see > any performance benefit when I run same query in unbucketed table. > > Do I have to set any property before running queries on bucketed tables. I > tried join query also, but no performance improvement. In fact, I think > it's taking few more seconds as compared to unbucketed table. > > Thanks, > AJ >
Hive Bucketing
Hi All, I have enabled bucketing in table. I created 256 buckets on user id. Now when I am querying (select count(*) from table where userid =172839393) that table, map reduce should only use single partitioned file as input to mappers. But its considering all files as input to mapper and I don't see any performance benefit when I run same query in unbucketed table. Do I have to set any property before running queries on bucketed tables. I tried join query also, but no performance improvement. In fact, I think it's taking few more seconds as compared to unbucketed table. Thanks, AJ
Adding JAR in Hive classpath
Hi All, I am facing an issue with the Hive classpath. I have written a UDAF which is using common maths 3.3 version. So, while creating temporary function I first add common maths 3.3 and then UDAF jar and create temporary function. There is another version of common math 3.1 present under HADOOP_HOME/lib directory. Now the problem is even after adding common maths 3.3 in Hive classpath (by ADD JAR ..) , Hive is picking common maths 3.1 version from HADOOP_HOME/lib folder. How do I remove 3.1 version from classpath. I tried using DELETE JAR ... but it doesnt work. Is there any way, I can force Hive to pick my version and not the one with Hadoop lib. Any help is appreciated. Thanks AJ
Fwd: Future date getting converted to epoch date with windowing function
Hi, I am trying to use hive windowing functions for a business use case. Hive version is Apache Hive 0.11. I have a table with a column end_date where value is 2999-12-31. While using hive windowing function with this value, Hive is converting it to 1970s date. *Query used is :* SELECT account_id, device_id, status, LEAD (status) OVER (PARTITION BY device_id ORDER BY start_date DESC) prev_status, start_date, end_date from my_table; *Sample data : * account_id device_id status primary_min start_date end_date 9 111 2 111 2012-08-29 00:00:00 2013-08-14 00:00:00 9 111 5 111 2013-08-15 00:00:00 2013-08-15 00:00:00 9 111 4 111 2013-08-16 00:00:00 2013-11-30 00:00:00 9 111 4 111 2013-12-01 00:00:00 2013-12-01 00:00:00 9 111 4 111 2013-12-02 00:00:00 2014-01-15 00:00:00 9 111 4 111 2014-01-16 00:00:00 2999-12-31 00:00:00 *Output : * account_id device_id status prev_status start_date end_date 9 111 2 NULL 2012-08-29 00:00:00 2013-08-14 00:00:00 9 111 5 2 2013-08-15 00:00:00 2013-08-15 00:00:00 9 111 4 5 2013-08-16 00:00:00 2013-11-30 00:00:00 9 111 4 4 2013-12-01 00:00:00 2013-12-01 00:00:00 9 111 4 4 2013-12-02 00:00:00 2014-01-15 00:00:00 9 111 4 4 2014-01-16 00:00:00 1979-03-26 23:28:00 Here, date 2999-12-31 got converted to 1979-03-26. I have tried converting date type to String but not help. Please let me know if anyone has faced same issue and resolved it. Thanks in advance, Akansha