from:"Akansha Jain"

Not able to get selective cols in orc using mapreduce

2016-02-18 Thread Akansha Jain

Hi,
I am reading ORC format using map reduce program. I am using
ORCNewInputFormat as an input format. I want to select few fields out of
50+ column table while reading. Though i am setting
hive.io.file.readcolumn.ids and hive.io.file.read.all.columns, it is still
selecting all columns in mapper. What I am missing.

Thanks,
Akansha jain


-- 
-Akansha

RE: Hive Bucketing

2016-01-25 Thread Akansha Jain

Thanks for detailed explanation. Even without bucket pruning, expectation
from bucketing is performance improvement. I am joining two tables which
are bucketed on same no of buckets and column and comparing the performance
with join of two unbucketed tables on bucket led column. I am using mapr
dist and map join conversion is enabled by default. Performance is same in
both the cases. I increased data size from 120 gb to 400 gb to see clear
performance difference. But still results are same. Have set
hive.optimize.bucketmapjoin=true.

Any clue why that would be happening.

Thanks,
AJ
On Jan 22, 2016 4:31 PM, "Mich Talebzadeh"  wrote:

> Hi,
>
>
>
> In general my understanding is that it will be possible to use bucket
> pruning much like partition pruning (elimination) soon
>
>
>
> Bucketing in Hive refers to hash partitioning where a hashing function is
> applied. Likewise an RDBMS like Oracle, Hive will apply a linear hashing
> algorithm to prevent data from clustering within specific partitions.
> Hashing is very effective if the column selected for bucketing has very
> high selectivity like an ID column where selectivity (select
> count(distinct(column))/count(column) ) = 1.  In this case, the created
> partitions/ files will be as evenly sized as possible. In a nutshell
> bucketing is a method to get data evenly distributed over many
> partitions/files.  One should define the number of buckets by a power of
> two -- 2^n,  like 2, 4, 8, 16 etc to achieve best results. Again bucketing
> will help concurrency in Hive. It may even allow a partition wise join i.e.
> a join between two tables that are bucketed on the same column with the
> same number of buckets (anyone has tried this?)
>
>
>
> One more things. When one defines the number of buckets at table creation
> level in Hive, the number of partitions/files will be fixed. In contrast,
> with partitioning you do not have this limitation.
>
> .
>
> Have you considered creating these tables as ORC tables?
>
>
>
> HTH
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> *Sybase ASE 15 Gold Medal Award 2008*
>
> A Winning Strategy: Running the most Critical Financial Data on ASE 15
>
>
> http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf
>
> Author of the books* "A Practitioner’s Guide to Upgrading to Sybase ASE
> 15", ISBN 978-0-9563693-0-7*.
>
> co-author *"Sybase Transact SQL Guidelines Best Practices", ISBN
> 978-0-9759693-0-4*
>
> *Publications due shortly:*
>
> *Complex Event Processing in Heterogeneous Environments*, ISBN:
> 978-0-9563693-3-8
>
> *Oracle and Sybase, Concepts and Contrasts*, ISBN: 978-0-9563693-1-4, volume
> one out shortly
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> NOTE: The information in this email is proprietary and confidential. This
> message is for the designated recipient only, if you are not the intended
> recipient, you should destroy it immediately. Any information in this
> message shall not be understood as given or endorsed by Peridale Technology
> Ltd, its subsidiaries or their employees, unless expressly so stated. It is
> the responsibility of the recipient to ensure that this email is virus
> free, therefore neither Peridale Technology Ltd, its subsidiaries nor their
> employees accept any responsibility.
>
>
>
> *From:* Akansha Jain [mailto:akansha.15au...@gmail.com]
> *Sent:* 22 January 2016 23:20
> *To:* user@hive.apache.org
> *Subject:* RE: Hive Bucketing
>
>
>
> Thanks for response. I am using 0.13 mapr version. Could you tell more
> about bucket pruning.
>
> On Jan 22, 2016 3:09 PM, "Mich Talebzadeh"  wrote:
>
> Ok we are talking about bucket pruning here
>
>
>
> What version of Hive are using?
>
>
>
> Bucket pruning I believe is available from version 2.0
>
>
>
> HTH
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> *Sybase ASE 15 Gold Medal Award 2008*
>
> A Winning Strategy: Running the most Critical Financial Data on ASE 15
>
>
> http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf
>
> Author of the books* "A Practitioner’s Guide to Upgrading to Sybase ASE
> 15", ISBN 978-0-9563693-0-7*.
>
> co-author *"Sybase Transact SQL Guidelines

RE: Hive Bucketing

2016-01-22 Thread Akansha Jain

Thanks for response. I am using 0.13 mapr version. Could you tell more
about bucket pruning.
On Jan 22, 2016 3:09 PM, "Mich Talebzadeh"  wrote:

> Ok we are talking about bucket pruning here
>
>
>
> What version of Hive are using?
>
>
>
> Bucket pruning I believe is available from version 2.0
>
>
>
> HTH
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> *Sybase ASE 15 Gold Medal Award 2008*
>
> A Winning Strategy: Running the most Critical Financial Data on ASE 15
>
>
> http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf
>
> Author of the books* "A Practitioner’s Guide to Upgrading to Sybase ASE
> 15", ISBN 978-0-9563693-0-7*.
>
> co-author *"Sybase Transact SQL Guidelines Best Practices", ISBN
> 978-0-9759693-0-4*
>
> *Publications due shortly:*
>
> *Complex Event Processing in Heterogeneous Environments*, ISBN:
> 978-0-9563693-3-8
>
> *Oracle and Sybase, Concepts and Contrasts*, ISBN: 978-0-9563693-1-4, volume
> one out shortly
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> NOTE: The information in this email is proprietary and confidential. This
> message is for the designated recipient only, if you are not the intended
> recipient, you should destroy it immediately. Any information in this
> message shall not be understood as given or endorsed by Peridale Technology
> Ltd, its subsidiaries or their employees, unless expressly so stated. It is
> the responsibility of the recipient to ensure that this email is virus
> free, therefore neither Peridale Technology Ltd, its subsidiaries nor their
> employees accept any responsibility.
>
>
>
> *From:* Akansha Jain [mailto:akansha.15au...@gmail.com]
> *Sent:* 22 January 2016 21:55
> *To:* user@hive.apache.org
> *Subject:* Hive Bucketing
>
>
>
> Hi All,
> I have enabled bucketing in table. I created 256 buckets on user id. Now
> when I am querying (select count(*) from table where userid =172839393)
> that table, map reduce should only use single partitioned file as input to
> mappers. But its considering all files as input to mapper and I don't see
> any performance benefit when I run same query in unbucketed table.
>
> Do I have to set any property before running queries on bucketed tables. I
> tried join query also, but no performance improvement. In fact, I think
> it's taking few more seconds as compared to unbucketed table.
>
> Thanks,
> AJ
>

Hive Bucketing

2016-01-22 Thread Akansha Jain

Hi All,
I have enabled bucketing in table. I created 256 buckets on user id. Now
when I am querying (select count(*) from table where userid =172839393)
that table, map reduce should only use single partitioned file as input to
mappers. But its considering all files as input to mapper and I don't see
any performance benefit when I run same query in unbucketed table.

Do I have to set any property before running queries on bucketed tables. I
tried join query also, but no performance improvement. In fact, I think
it's taking few more seconds as compared to unbucketed table.

Thanks,
AJ

Adding JAR in Hive classpath

2015-09-04 Thread Akansha Jain

Hi All,

I am facing an issue with the Hive classpath. I have written a UDAF which
is using common maths 3.3 version. So, while creating temporary function I
first add common maths 3.3 and then UDAF jar and create temporary function.

There is another version of common math 3.1 present under HADOOP_HOME/lib
directory.

Now the problem is even after adding common maths 3.3 in Hive classpath (by
ADD JAR ..) , Hive is picking common maths 3.1 version from HADOOP_HOME/lib
folder. How do I remove 3.1 version from classpath.

I tried using DELETE JAR ... but it doesnt work.

Is there any way, I can force Hive to pick my version and not the one with
Hadoop lib. Any help is appreciated.

Thanks

AJ

Fwd: Future date getting converted to epoch date with windowing function

2014-03-25 Thread Akansha Jain

Hi,
I am trying to use hive windowing functions for a business use case. Hive
version is Apache Hive 0.11.
I have a table with a column end_date where value is 2999-12-31. While
using hive windowing function with this value, Hive is converting it to
1970s date.

*Query used is :*

SELECT account_id,
   device_id,
   status,
   LEAD (status) OVER (PARTITION BY device_id ORDER BY start_date DESC)
prev_status,
   start_date,
   end_date
from my_table;

*Sample data : *

  account_id  device_id  status  primary_min  start_date  end_date





 9 111 2 111 2012-08-29 00:00:00  2013-08-14 00:00:00  9
111 5 111 2013-08-15 00:00:00  2013-08-15 00:00:00  9 111 4
111 2013-08-16 00:00:00  2013-11-30 00:00:00  9 111 4
111 2013-12-01
00:00:00  2013-12-01 00:00:00  9 111 4 111 2013-12-02
00:00:00 2014-01-15 00:00:00
9 111 4 111 2014-01-16 00:00:00  2999-12-31 00:00:00

*Output : *

  account_id device_id status prev_status start_date end_date  9 111
2 NULL 2012-08-29 00:00:00 2013-08-14 00:00:00  9 111 5 2 2013-08-15
00:00:00 2013-08-15 00:00:00  9 111 4 5 2013-08-16 00:00:00 2013-11-30
00:00:00  9 111 4 4 2013-12-01 00:00:00 2013-12-01 00:00:00  9
111 4 4 2013-12-02 00:00:00 2014-01-15 00:00:00  9 111 4 4
2014-01-16
00:00:00 1979-03-26 23:28:00

Here, date 2999-12-31 got converted to 1979-03-26. I have tried converting
date type to String but not help.

Please let me know if anyone has faced same issue and resolved it.

Thanks in advance,

Akansha

Not able to get selective cols in orc using mapreduce

RE: Hive Bucketing

RE: Hive Bucketing

Hive Bucketing

Adding JAR in Hive classpath

Fwd: Future date getting converted to epoch date with windowing function

6 matches

Site Navigation

Mail list logo

Footer information