Re: Query Failing while querying on ORC Format

2016-05-17 Thread mahender bigdata
gd...@outlook.com> *Sent:* Saturday, May 14, 2016 3:29 PM *To:* user@hive.apache.org <mailto:user@hive.apache.org> *Subject:* Query Failing while querying on ORC Format Hi, We are dumping our data into ORC Partition Bucketed table. We have load

Re: Query Failing while querying on ORC Format

2016-05-17 Thread Mich Talebzadeh
m > > > > On 16 May 2016 at 23:53, mahender bigdata > wrote: > >> I'm on Hive 1.2 >> >> On 5/16/2016 12:02 PM, Matthew McCline wrote: >> >> ​ >> >> What version of Hive are you on? >> >> >> -- &

Re: Query Failing while querying on ORC Format

2016-05-17 Thread mahender bigdata
om> *Sent:* Saturday, May 14, 2016 3:29 PM *To:* user@hive.apache.org <mailto:user@hive.apache.org> *Subject:* Query Failing while querying on ORC Format Hi, We are dumping our data into ORC Partition Bucketed table. We have loaded almost 6 months data and here month i

Re: Query Failing while querying on ORC Format

2016-05-17 Thread mahender bigdata
f Hive are you on? *From:* Mahender Sarangam <mailto:mahender.bigd...@outlook.com> *Sent:* Saturday, May 14, 2016 3:29 PM *To:* user@hive.apache.org <mailto:user@hive.apache.org> *Subject:* Query Failing while querying on

Re: Query Failing while querying on ORC Format

2016-05-17 Thread Jörn Franke
r bigdata >> wrote: >> I'm on Hive 1.2 >> >>> On 5/16/2016 12:02 PM, Matthew McCline wrote: >>> ​ >>> What version of Hive are you on? >>> >>> From: Mahender Sarangam >>> Sent: Saturday, May 14, 2016 3:29 PM >>>

Re: Query Failing while querying on ORC Format

2016-05-17 Thread Mich Talebzadeh
atthew McCline wrote: > > ​ > > What version of Hive are you on? > > > -- > *From:* Mahender Sarangam > > *Sent:* Saturday, May 14, 2016 3:29 PM > *To:* user@hive.apache.org > *Subject:* Query Failing while querying on ORC Format > > Hi, &

Re: Query Failing while querying on ORC Format

2016-05-16 Thread mahender bigdata
iling while querying on ORC Format Hi, We are dumping our data into ORC Partition Bucketed table. We have loaded almost 6 months data and here month is Partition by column. Now we have modified ORC partition bucketed table schema. We have added 2 more columns to the ORC table. Now whenever we are ru

Re: Query Failing while querying on ORC Format

2016-05-16 Thread Mich Talebzadeh
http://talebzadehmich.wordpress.com On 16 May 2016 at 20:02, Matthew McCline wrote: > ​ > > What version of Hive are you on? > > > -- > *From:* Mahender Sarangam > *Sent:* Saturday, May 14, 2016 3:29 PM > *To:* user@hive.apache.org > *Subjec

Re: Query Failing while querying on ORC Format

2016-05-16 Thread Matthew McCline
? What version of Hive are you on? From: Mahender Sarangam Sent: Saturday, May 14, 2016 3:29 PM To: user@hive.apache.org Subject: Query Failing while querying on ORC Format Hi, We are dumping our data into ORC Partition Bucketed table. We have loaded almost 6

Re: Query Failing while querying on ORC Format

2016-05-15 Thread mahender bigdata
Hi Mich, sorry. Link is not pointing to right location. On 5/15/2016 1:25 PM, Mich Talebzadeh wrote: Hi Mahender, Please check this thread https://mail.google.com/mail/#search/alter+table+add+columns+aternatives+or+hive+refresh/153fe59e7c2970b2 HTH Dr Mich Talebzadeh LinkedIn /https://w

Re: Query Failing while querying on ORC Format

2016-05-15 Thread mahender bigdata
For Temporary, I'm disabling vectorization on ORC table. then it is working. On 5/15/2016 3:38 PM, mahender bigdata wrote: here is the error message https://issues.apache.org/jira/browse/HIVE-10598 Error: java.lang.RuntimeException: Error creating a batch at org.apache.hadoop.hive.ql.io.or

Re: Query Failing while querying on ORC Format

2016-05-15 Thread mahender bigdata
here is the error message https://issues.apache.org/jira/browse/HIVE-10598 Error: java.lang.RuntimeException: Error creating a batch at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcInputFormat$VectorizedOrcRecordReader.createValue(VectorizedOrcInputFormat.java:114) at org.apache.hadoop.hive.

Re: Query Failing while querying on ORC Format

2016-05-15 Thread mahender bigdata
here is the error message Error: java.lang.RuntimeException: Error creating a batch at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcInputFormat$VectorizedOrcRecordReader.createValue(VectorizedOrcInputFormat.java:114) at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcInputFormat$VectorizedOrcR

Re: Query Failing while querying on ORC Format

2016-05-15 Thread Mich Talebzadeh
Hi Mahender, Please check this thread https://mail.google.com/mail/#search/alter+table+add+columns+aternatives+or+hive+refresh/153fe59e7c2970b2 HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

Re: Query Failing while querying on ORC Format

2016-05-15 Thread mahender bigdata
Hi Mich, Is there any link missing ?. We have already added column. Some how the old partition data with new column is not failing to retrieving. /mahens On 5/14/2016 4:15 PM, Mich Talebzadeh wrote: that night help

Re: Query Failing while querying on ORC Format

2016-05-14 Thread Mich Talebzadeh
check this thread. alter table add columns aternatives or hive refresh that night help HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw * ht

Query Failing while querying on ORC Format

2016-05-14 Thread Mahender Sarangam
Hi, We are dumping our data into ORC Partition Bucketed table. We have loaded almost 6 months data and here month is Partition by column. Now we have modified ORC partition bucketed table schema. We have added 2 more columns to the ORC table. Now whenever we are running select statement for olde

Re: Container out of memory: ORC format with many dynamic partitions

2016-05-02 Thread Matt Olson
ur insert query? I suspect sorted >>> dynamic partition optimization is bailing out because of >>> the constant value for ‘dt' column. If you are not seeing a reducer then >>> its likely not using the sorted dynamic partition optimization. >>> You are probab

Re: Container out of memory: ORC format with many dynamic partitions

2016-05-02 Thread Prasanth Jayachandran
l.com] Sent: Friday, April 29, 2016 7:50 PM To: user@hive.apache.org<mailto:user@hive.apache.org> Subject: Container out of memory: ORC format with many dynamic partitions Hi all, I am using Hive 1.0.1 and trying to do a simple insert into an ORC table, creating dynamic partitions. I am selecting from

Re: Container out of memory: ORC format with many dynamic partitions

2016-05-02 Thread Matt Olson
es.apache.org/jira/browse/HIVE-12893 >> I can confirm if thats the case by looking at the explain plan. >> >> Thanks >> Prasanth >> >> On May 2, 2016, at 2:24 PM, Ryan Harris >> wrote: >> >> reading this: >> "but when I add 2000 new title

Re: Container out of memory: ORC format with many dynamic partitions

2016-05-02 Thread Prasanth Jayachandran
namic partitions that each have 300 rows of data in them is asking for problems in Hive IMO... From: Matt Olson [mailto:maolso...@gmail.com] Sent: Friday, April 29, 2016 7:50 PM To: user@hive.apache.org<mailto:user@hive.apache.org> Subject: Container out of memory: ORC format with many

Re: Container out of memory: ORC format with many dynamic partitions

2016-05-02 Thread Matt Olson
-only maybe date + > title_type, but adding 2000+ dynamic partitions that each have 300 rows of > data in them is asking for problems in Hive IMO... > > > *From:* Matt Olson [mailto:maolso...@gmail.com ] > *Sent:* Friday, April 29, 2016 7:50 PM > *To:* user@hive.apache.org &

Re: Container out of memory: ORC format with many dynamic partitions

2016-05-02 Thread Prasanth Jayachandran
ch have 300 rows of data in them is asking for problems in Hive IMO... From: Matt Olson [mailto:maolso...@gmail.com] Sent: Friday, April 29, 2016 7:50 PM To: user@hive.apache.org<mailto:user@hive.apache.org> Subject: Container out of memory: ORC format with many dynamic partitions Hi all,

RE: Container out of memory: ORC format with many dynamic partitions

2016-05-02 Thread Ryan Harris
+ title_type, but adding 2000+ dynamic partitions that each have 300 rows of data in them is asking for problems in Hive IMO... From: Matt Olson [mailto:maolso...@gmail.com] Sent: Friday, April 29, 2016 7:50 PM To: user@hive.apache.org Subject: Container out of memory: ORC format with many dynam

Re: Container out of memory: ORC format with many dynamic partitions

2016-04-30 Thread Gopal Vijayaraghavan
> SET hive.exec.orc.memory.pool=1.0; Might be a bad idea in general, this causes more OOMs than less. > SET mapred.map.child.java.opts=-Xmx2048M; > SET mapred.child.java.opts=-Xmx2048M; ... > Container >[pid=6278,containerID=container_e26_1460661845156_49295_01_000244] is >running beyond physic

Re: Container out of memory: ORC format with many dynamic partitions

2016-04-29 Thread Jörn Franke
I would still need some time to dig deeper in this. Are you using a specific distribution? Would it be possible to upgrade to a more recent Hive version? However, having so many small partitions is a bad practice which seriously affects performance. Each partition should at least contain several

Container out of memory: ORC format with many dynamic partitions

2016-04-29 Thread Matt Olson
Hi all, I am using Hive 1.0.1 and trying to do a simple insert into an ORC table, creating dynamic partitions. I am selecting from a table partitioned by dt and category, and inserting into a table partitioned by dt, title, and title_type. Other than the partitioning, the tables have the same sche

Re: reading ORC format on Spark-SQL

2016-02-11 Thread Philip Lee
particular point because of Amdahl's law. This sentence is a bit confusing. so time of reading CSV file on Spark is linearnly increasing as the data increase. because it employes the full cluster, which means it runs out of capacity? On the other hand, the reason why time of reading ORC format

Re: reading ORC format on Spark-SQL

2016-02-10 Thread Philip Lee
particular point because of Amdahl's law. This sentence is a bit confusing. so time of reading CSV file on Spark is linearnly increasing as the data increase. because it employes the full cluster, which means it runs out of capacity? On the other hand, the reason why time of reading ORC format

RE: reading ORC format on Spark-SQL

2016-02-10 Thread Mich Talebzadeh
aghavan Sent: 10 February 2016 21:43 To: user@hive.apache.org Subject: Re: reading ORC format on Spark-SQL > The reason why I am asking this kind of question is reading csv file on >Spark is linearly increasing as the data size increase a bit, but reading >ORC format on Spark-SQL

Re: reading ORC format on Spark-SQL

2016-02-10 Thread Gopal Vijayaraghavan
> The reason why I am asking this kind of question is reading csv file on >Spark is linearly increasing as the data size increase a bit, but reading >ORC format on Spark-SQL is still same as the data size increses in >. ... > This cause is from (just property of reading ORC forma

RE: reading ORC format on Spark-SQL

2016-02-10 Thread Mich Talebzadeh
From: Philip Lee [mailto:philjj...@gmail.com] Sent: 10 February 2016 20:39 To: user@hive.apache.org Subject: reading ORC format on Spark-SQL What kind of steps exists when reading ORC format on Spark-SQL? I meant usually reading csv file is just directly reading the dataset on memory.

reading ORC format on Spark-SQL

2016-02-10 Thread Philip Lee
What kind of steps exists when reading ORC format on Spark-SQL? I meant usually reading csv file is just directly reading the dataset on memory. But I feel like Spark-SQL has some steps when reading ORC format. For example, they have to create table to insert the dataset? and then they insert the

RE: ORC format

2016-02-02 Thread Mich Talebzadeh
It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Peridale Technology Ltd, its subsidiaries nor their employees accept any responsibility. From: Philip Lee [mailto:philjj...@gmail.com] Sent: 02 February 2016 16:10 To: user@hive.apache.o

Re: ORC format

2016-02-02 Thread Philip Lee
ient to ensure that this email is virus > free, therefore neither Peridale Technology Ltd, its subsidiaries nor their > employees accept any responsibility. > > > > *From:* Lefty Leverenz [mailto:leftylever...@gmail.com] > *Sent:* 02 February 2016 10:26 > > *To:* user

RE: ORC format

2016-02-02 Thread Mich Talebzadeh
nsibility. From: Lefty Leverenz [mailto:leftylever...@gmail.com] Sent: 02 February 2016 10:26 To: user@hive.apache.org Subject: Re: ORC format Can't resist teasing Mich about this: "Indeed one often demoralises data taking advantages of massive parallel processing in Hive."

Re: ORC format

2016-02-02 Thread Lefty Leverenz
ale Technology > Ltd, its subsidiaries or their employees, unless expressly so stated. It is > the responsibility of the recipient to ensure that this email is virus > free, therefore neither Peridale Technology Ltd, its subsidiaries nor their > employees accept any responsibility. > &g

RE: ORC format

2016-02-01 Thread Mich Talebzadeh
heir employees accept any responsibility. From: Alan Gates [mailto:alanfga...@gmail.com] Sent: 01 February 2016 17:07 To: user@hive.apache.org Subject: Re: ORC format ORC does not currently expose a primary key to the user, though we have talked of having it do that. As Mich says the i

Re: ORC format

2016-02-01 Thread Alan Gates
curious of some things. I know ORC format is faster on filtering or reading because it has indexing. Has it advantage of joining two tables of ORC dataset as well? Could you explain about it in detail? When experimenting, it seems like it has some advantages of joining in some aspect, but no

RE: ORC format

2016-02-01 Thread Mich Talebzadeh
shall not be understood as given or endorsed by Peridale Technology Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Peridale Technology Ltd, its subsidiaries nor their employees a

RE: ORC format

2016-02-01 Thread Mich Talebzadeh
ale Technology Ltd, its subsidiaries nor their employees accept any responsibility. From: Philip Lee [mailto:philjj...@gmail.com <mailto:philjj...@gmail.com> ] Sent: 01 February 2016 15:21 To: user@hive.apache.org <mailto:user@hive.apache.org> Subject: ORC format Hello, I experi

Re: ORC format

2016-02-01 Thread Philip Lee
r endorsed by Peridale Technology > Ltd, its subsidiaries or their employees, unless expressly so stated. It is > the responsibility of the recipient to ensure that this email is virus > free, therefore neither Peridale Technology Ltd, its subsidiaries nor their > employees accept any res

Re: ORC format

2016-02-01 Thread Philip Lee
t; > I experiment the performance of some systems between ORC and CSV file. > I read about ORC documentation on Hive website, but still curious of some > things. > > I know ORC format is faster on filtering or reading because it has > indexing. > Has it advantage of joining t

RE: ORC format

2016-02-01 Thread Mich Talebzadeh
heir employees accept any responsibility. From: Philip Lee [mailto:philjj...@gmail.com] Sent: 01 February 2016 15:27 To: user@hive.apache.org Subject: Re: ORC format Also, when making ORC from CSV, for indexing every key on each coulmn is made, or a primary on a table is made ? If keys ar

RE: ORC format

2016-02-01 Thread Mich Talebzadeh
y of the recipient to ensure that this email is virus free, therefore neither Peridale Technology Ltd, its subsidiaries nor their employees accept any responsibility. From: Philip Lee [mailto:philjj...@gmail.com] Sent: 01 February 2016 15:21 To: user@hive.apache.org Subject: ORC format Hell

ORC format

2016-02-01 Thread Philip Lee
Hello, I experiment the performance of some systems between ORC and CSV file. I read about ORC documentation on Hive website, but still curious of some things. I know ORC format is faster on filtering or reading because it has indexing. Has it advantage of joining two tables of ORC dataset as

Is it worth of using ORC format in my case. Can I replace hive with HBase.

2015-08-06 Thread venkatesh b
many intermediate tables generated around 50 while processing. Till now we use text format as storage. We came across ORC file format. I would like to know that since it is one Time querying the table is it worth of storing as ORC format. Here we do full table scans. SECOND: I came to know about

Re: Benefit of ORC format storing Sum, Min, Max...

2015-05-29 Thread Gopal Vijayaraghavan
> I am new to Hive, please help me understand the benefit of ORC file >format storing Sum, Min, Max values. > Whenever we try to find a sum of values in a particular column, it still >runs the MapReduce job. ORC uses row-indexes to constraint filtering. What you¹re looking at is the ORC file foo

Re: Benefit of ORC format storing Sum, Min, Max...

2015-05-29 Thread Jagat Singh
Did you do table column stats On 30 May 2015 9:04 am, "sreejesh s" wrote: > Hi, > > I am new to Hive, please help me understand the benefit of ORC file format > storing Sum, Min, Max values. > Whenever we try to find a sum of values in a particular column, it still > runs the MapReduce job. > > s

Benefit of ORC format storing Sum, Min, Max...

2015-05-29 Thread sreejesh s
Hi, I am new to Hive, please help me understand the benefit of ORC file format storing Sum, Min, Max values.Whenever we try to find a sum of values in a particular column, it still runs the MapReduce job. select sum(col1) from orctable;select sum(col1) from txttable; For a sample file with around

Re: Error using ORC Format with Hive

2014-04-05 Thread Prasanth Jayachandran
ator.close(Operator.java:596) >> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:613) >> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:613) >> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:613) >> at >&

Re: Error using ORC Format with Hive

2014-04-05 Thread Amit Tewari
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:207) > > Amit > > >> On 4/4/14 2:28 PM, Amit Tewari wrote: >> Hi All, >> >> I am just trying to do some simple tests to see speedup in hive query with >> Hive 0.14 (trunk version this

Re: Error using ORC Format with Hive

2014-04-04 Thread Bryan Jeffrey
hive query with Hive 0.14 (trunk version this morning). Just tried to use sample test case to start with. First wanted to see how much I can speed up using ORC format. However for some reason I can't insert data into the table with ORC format. It fails with Exception "File could only be repl

Re: Error using ORC Format with Hive

2014-04-04 Thread Amit Tewari
o see how much I can speed up using ORC format. However for some reason I can't insert data into the table with ORC format. It fails with Exception "File could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and no node(s) are excluded in

Error using ORC Format with Hive

2014-04-04 Thread Amit Tewari
Hi All, I am just trying to do some simple tests to see speedup in hive query with Hive 0.14 (trunk version this morning). Just tried to use sample test case to start with. First wanted to see how much I can speed up using ORC format. However for some reason I can't insert data int

Error using ORC format

2014-04-04 Thread Amit Tewari
Hi All, I am just trying to do some simple tests to see speedup in hive query with Hive 0.14 (trunk version this morning). Just tried to use sample test case to start with. First wanted to see how much I can speed up using ORC format. However for some reason I can't insert data int