Re: Query Failing while querying on ORC Format

2016-05-17 Thread mahender bigdata
gd...@outlook.com> <mailto:mahender.bigd...@outlook.com> *Sent:* Saturday, May 14, 2016 3:29 PM *To:* user@hive.apache.org <mailto:user@hive.apache.org> *Subject:* Query Failing while querying on ORC Format Hi, We are dumping our data int

Re: Query Failing while querying on ORC Format

2016-05-17 Thread Mich Talebzadeh
Matthew McCline wrote: >> >> ​ >> >> What version of Hive are you on? >> >> >> -- >> *From:* Mahender Sarangam <mahender.bigd...@outlook.com> >> <mahender.bigd...@outlook.com> >> *Sent:* Saturday, May 14, 20

Re: Query Failing while querying on ORC Format

2016-05-17 Thread mahender bigdata
Saturday, May 14, 2016 3:29 PM *To:* user@hive.apache.org <mailto:user@hive.apache.org> *Subject:* Query Failing while querying on ORC Format Hi, We are dumping our data into ORC Partition Bucketed table. We have loaded almost 6 months data and here month is Partition by column

Re: Query Failing while querying on ORC Format

2016-05-17 Thread mahender bigdata
rangam <mahender.bigd...@outlook.com> <mailto:mahender.bigd...@outlook.com> *Sent:* Saturday, May 14, 2016 3:29 PM *To:* user@hive.apache.org <mailto:user@hive.apache.org> *Subject:* Query Failing while querying on ORC Format Hi, We are dumping our data into ORC Parti

Re: Query Failing while querying on ORC Format

2016-05-17 Thread Jörn Franke
t;>> >>> From: Mahender Sarangam <mahender.bigd...@outlook.com> >>> Sent: Saturday, May 14, 2016 3:29 PM >>> To: user@hive.apache.org >>> Subject: Query Failing while querying on ORC Format >>> >>> Hi, >>> We are dumping our

Re: Query Failing while querying on ORC Format

2016-05-17 Thread Mich Talebzadeh
; > -- > *From:* Mahender Sarangam <mahender.bigd...@outlook.com> > <mahender.bigd...@outlook.com> > *Sent:* Saturday, May 14, 2016 3:29 PM > *To:* user@hive.apache.org > *Subject:* Query Failing while querying on ORC Format > > Hi, > We

Re: Query Failing while querying on ORC Format

2016-05-16 Thread mahender bigdata
* user@hive.apache.org *Subject:* Query Failing while querying on ORC Format Hi, We are dumping our data into ORC Partition Bucketed table. We have loaded almost 6 months data and here month is Partition by column. Now we have modified ORC partition bucketed table schema. We have added 2 more columns to t

Re: Query Failing while querying on ORC Format

2016-05-16 Thread Mich Talebzadeh
> wrote: > ​ > > What version of Hive are you on? > > > -- > *From:* Mahender Sarangam <mahender.bigd...@outlook.com> > *Sent:* Saturday, May 14, 2016 3:29 PM > *To:* user@hive.apache.org > *Subject:* Query Failing while querying on

Re: Query Failing while querying on ORC Format

2016-05-16 Thread Matthew McCline
? What version of Hive are you on? From: Mahender Sarangam <mahender.bigd...@outlook.com> Sent: Saturday, May 14, 2016 3:29 PM To: user@hive.apache.org Subject: Query Failing while querying on ORC Format Hi, We are dumping our data into ORC Partition Bu

Re: Query Failing while querying on ORC Format

2016-05-15 Thread mahender bigdata
Hi Mich, sorry. Link is not pointing to right location. On 5/15/2016 1:25 PM, Mich Talebzadeh wrote: Hi Mahender, Please check this thread https://mail.google.com/mail/#search/alter+table+add+columns+aternatives+or+hive+refresh/153fe59e7c2970b2 HTH Dr Mich Talebzadeh LinkedIn

Re: Query Failing while querying on ORC Format

2016-05-15 Thread mahender bigdata
For Temporary, I'm disabling vectorization on ORC table. then it is working. On 5/15/2016 3:38 PM, mahender bigdata wrote: here is the error message https://issues.apache.org/jira/browse/HIVE-10598 Error: java.lang.RuntimeException: Error creating a batch at

Re: Query Failing while querying on ORC Format

2016-05-15 Thread mahender bigdata
here is the error message https://issues.apache.org/jira/browse/HIVE-10598 Error: java.lang.RuntimeException: Error creating a batch at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcInputFormat$VectorizedOrcRecordReader.createValue(VectorizedOrcInputFormat.java:114) at

Re: Query Failing while querying on ORC Format

2016-05-15 Thread mahender bigdata
here is the error message Error: java.lang.RuntimeException: Error creating a batch at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcInputFormat$VectorizedOrcRecordReader.createValue(VectorizedOrcInputFormat.java:114) at

Re: Query Failing while querying on ORC Format

2016-05-15 Thread Mich Talebzadeh
Hi Mahender, Please check this thread https://mail.google.com/mail/#search/alter+table+add+columns+aternatives+or+hive+refresh/153fe59e7c2970b2 HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

Re: Query Failing while querying on ORC Format

2016-05-15 Thread mahender bigdata
Hi Mich, Is there any link missing ?. We have already added column. Some how the old partition data with new column is not failing to retrieving. /mahens On 5/14/2016 4:15 PM, Mich Talebzadeh wrote: that night help

Re: Query Failing while querying on ORC Format

2016-05-14 Thread Mich Talebzadeh
check this thread. alter table add columns aternatives or hive refresh that night help HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw *

Query Failing while querying on ORC Format

2016-05-14 Thread Mahender Sarangam
Hi, We are dumping our data into ORC Partition Bucketed table. We have loaded almost 6 months data and here month is Partition by column. Now we have modified ORC partition bucketed table schema. We have added 2 more columns to the ORC table. Now whenever we are running select statement for

Re: Container out of memory: ORC format with many dynamic partitions

2016-05-02 Thread Matt Olson
gt;> Can you please post explain plan for your insert query? I suspect sorted >>> dynamic partition optimization is bailing out because of >>> the constant value for ‘dt' column. If you are not seeing a reducer then >>> its likely not using the sorted dynamic partition optimi

Re: Container out of memory: ORC format with many dynamic partitions

2016-05-02 Thread Prasanth Jayachandran
ions that each have 300 rows of data in them is asking for problems in Hive IMO... From: Matt Olson [mailto:maolso...@gmail.com] Sent: Friday, April 29, 2016 7:50 PM To: user@hive.apache.org<mailto:user@hive.apache.org> Subject: Container out of memory: ORC format with many dynamic partiti

Re: Container out of memory: ORC format with many dynamic partitions

2016-05-02 Thread Matt Olson
: >> "but when I add 2000 new titles with 300 rows each" >> I'm thinking that you are over-partitioning your data >> I'm not sure exactly how that relates to the OOM error you are getting >> (it may not)I'd test things out partitioning by date-only may

Re: Container out of memory: ORC format with many dynamic partitions

2016-05-02 Thread Matt Olson
oning by date-only maybe date + > title_type, but adding 2000+ dynamic partitions that each have 300 rows of > data in them is asking for problems in Hive IMO... > > > *From:* Matt Olson [mailto:maolso...@gmail.com <maolso...@gmail.com>] > *Sent:* Friday, April 29,

Re: Container out of memory: ORC format with many dynamic partitions

2016-05-02 Thread Prasanth Jayachandran
ons that each have 300 rows of data in them is asking for problems in Hive IMO... From: Matt Olson [mailto:maolso...@gmail.com] Sent: Friday, April 29, 2016 7:50 PM To: user@hive.apache.org<mailto:user@hive.apache.org> Subject: Container out of memory: ORC format with many dynamic partitio

RE: Container out of memory: ORC format with many dynamic partitions

2016-05-02 Thread Ryan Harris
but adding 2000+ dynamic partitions that each have 300 rows of data in them is asking for problems in Hive IMO... From: Matt Olson [mailto:maolso...@gmail.com] Sent: Friday, April 29, 2016 7:50 PM To: user@hive.apache.org Subject: Container out of memory: ORC format with many dynamic partitions

Re: Container out of memory: ORC format with many dynamic partitions

2016-04-30 Thread Gopal Vijayaraghavan
> SET hive.exec.orc.memory.pool=1.0; Might be a bad idea in general, this causes more OOMs than less. > SET mapred.map.child.java.opts=-Xmx2048M; > SET mapred.child.java.opts=-Xmx2048M; ... > Container >[pid=6278,containerID=container_e26_1460661845156_49295_01_000244] is >running beyond

Re: Container out of memory: ORC format with many dynamic partitions

2016-04-30 Thread Jörn Franke
I would still need some time to dig deeper in this. Are you using a specific distribution? Would it be possible to upgrade to a more recent Hive version? However, having so many small partitions is a bad practice which seriously affects performance. Each partition should at least contain

Container out of memory: ORC format with many dynamic partitions

2016-04-29 Thread Matt Olson
Hi all, I am using Hive 1.0.1 and trying to do a simple insert into an ORC table, creating dynamic partitions. I am selecting from a table partitioned by dt and category, and inserting into a table partitioned by dt, title, and title_type. Other than the partitioning, the tables have the same

Re: reading ORC format on Spark-SQL

2016-02-11 Thread Philip Lee
a particular point because of Amdahl's law. This sentence is a bit confusing. so time of reading CSV file on Spark is linearnly increasing as the data increase. because it employes the full cluster, which means it runs out of capacity? On the other hand, the reason why time of reading ORC format shows

reading ORC format on Spark-SQL

2016-02-10 Thread Philip Lee
What kind of steps exists when reading ORC format on Spark-SQL? I meant usually reading csv file is just directly reading the dataset on memory. But I feel like Spark-SQL has some steps when reading ORC format. For example, they have to create table to insert the dataset? and then they insert

RE: reading ORC format on Spark-SQL

2016-02-10 Thread Mich Talebzadeh
From: Philip Lee [mailto:philjj...@gmail.com] Sent: 10 February 2016 20:39 To: user@hive.apache.org Subject: reading ORC format on Spark-SQL What kind of steps exists when reading ORC format on Spark-SQL? I meant usually reading csv file is just directly reading the dataset on memory.

RE: reading ORC format on Spark-SQL

2016-02-10 Thread Mich Talebzadeh
ruary 2016 21:43 To: user@hive.apache.org Subject: Re: reading ORC format on Spark-SQL > The reason why I am asking this kind of question is reading csv file on >Spark is linearly increasing as the data size increase a bit, but reading >ORC format on Spark-SQL is still same as the

Re: reading ORC format on Spark-SQL

2016-02-10 Thread Gopal Vijayaraghavan
> The reason why I am asking this kind of question is reading csv file on >Spark is linearly increasing as the data size increase a bit, but reading >ORC format on Spark-SQL is still same as the data size increses in >. ... > This cause is from (just property of reading ORC forma

Re: reading ORC format on Spark-SQL

2016-02-10 Thread Philip Lee
a particular point because of Amdahl's law. This sentence is a bit confusing. so time of reading CSV file on Spark is linearnly increasing as the data increase. because it employes the full cluster, which means it runs out of capacity? On the other hand, the reason why time of reading ORC format shows

Re: ORC format

2016-02-02 Thread Lefty Leverenz
as given or endorsed by Peridale Technology > Ltd, its subsidiaries or their employees, unless expressly so stated. It is > the responsibility of the recipient to ensure that this email is virus > free, therefore neither Peridale Technology Ltd, its subsidiaries nor their > employees accept any

RE: ORC format

2016-02-02 Thread Mich Talebzadeh
It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Peridale Technology Ltd, its subsidiaries nor their employees accept any responsibility. From: Philip Lee [mailto:philjj...@gmail.com] Sent: 02 February 2016 16:10 To: user@hive.apache.o

RE: ORC format

2016-02-02 Thread Mich Talebzadeh
nsibility. From: Lefty Leverenz [mailto:leftylever...@gmail.com] Sent: 02 February 2016 10:26 To: user@hive.apache.org Subject: Re: ORC format Can't resist teasing Mich about this: "Indeed one often demoralises data taking advantages of massive parallel processing in Hive."

Re: ORC format

2016-02-02 Thread Philip Lee
sponsibility of the recipient to ensure that this email is virus > free, therefore neither Peridale Technology Ltd, its subsidiaries nor their > employees accept any responsibility. > > > > *From:* Lefty Leverenz [mailto:leftylever...@gmail.com] > *Sent:* 02 February 2016 10:26 >

RE: ORC format

2016-02-01 Thread Mich Talebzadeh
o stated. It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Peridale Technology Ltd, its subsidiaries nor their employees accept any responsibility. From: Mich Talebzadeh [mailto:m...@peridale.co.uk] Sent: 01 February 2016 15:58 To: user@hive.apach

Re: ORC format

2016-02-01 Thread Alan Gates
ous of some things. I know ORC format is faster on filtering or reading because it has indexing. Has it advantage of joining two tables of ORC dataset as well? Could you explain about it in detail? When experimenting, it seems like it has some advantages of joining in some aspect, but not qui

ORC format

2016-02-01 Thread Philip Lee
Hello, I experiment the performance of some systems between ORC and CSV file. I read about ORC documentation on Hive website, but still curious of some things. I know ORC format is faster on filtering or reading because it has indexing. Has it advantage of joining two tables of ORC dataset

Re: ORC format

2016-02-01 Thread Philip Lee
com> wrote: > Hello, > > I experiment the performance of some systems between ORC and CSV file. > I read about ORC documentation on Hive website, but still curious of some > things. > > I know ORC format is faster on filtering or reading because it has > indexing. > Has i

Re: ORC format

2016-02-01 Thread Philip Lee
be understood as given or endorsed by Peridale Technology > Ltd, its subsidiaries or their employees, unless expressly so stated. It is > the responsibility of the recipient to ensure that this email is virus > free, therefore neither Peridale Technology Ltd, its subsidiaries n

RE: ORC format

2016-02-01 Thread Mich Talebzadeh
rus free, therefore neither Peridale Technology Ltd, its subsidiaries nor their employees accept any responsibility. From: Philip Lee [mailto:philjj...@gmail.com <mailto:philjj...@gmail.com> ] Sent: 01 February 2016 15:21 To: user@hive.apache.org <mailto:user@hive.apache.org> Subject: ORC

RE: ORC format

2016-02-01 Thread Mich Talebzadeh
heir employees accept any responsibility. From: Alan Gates [mailto:alanfga...@gmail.com] Sent: 01 February 2016 17:07 To: user@hive.apache.org Subject: Re: ORC format ORC does not currently expose a primary key to the user, though we have talked of having it do that. As Mich says the i

Is it worth of using ORC format in my case. Can I replace hive with HBase.

2015-08-06 Thread venkatesh b
. There are many intermediate tables generated around 50 while processing. Till now we use text format as storage. We came across ORC file format. I would like to know that since it is one Time querying the table is it worth of storing as ORC format. Here we do full table scans. SECOND: I came to know about

Benefit of ORC format storing Sum, Min, Max...

2015-05-29 Thread sreejesh s
Hi, I am new to Hive, please help me understand the benefit of ORC file format storing Sum, Min, Max values.Whenever we try to find a sum of values in a particular column, it still runs the MapReduce job. select sum(col1) from orctable;select sum(col1) from txttable; For a sample file with

Re: Benefit of ORC format storing Sum, Min, Max...

2015-05-29 Thread Jagat Singh
Did you do table column stats On 30 May 2015 9:04 am, sreejesh s sreejesh...@yahoo.com wrote: Hi, I am new to Hive, please help me understand the benefit of ORC file format storing Sum, Min, Max values. Whenever we try to find a sum of values in a particular column, it still runs the

Re: Error using ORC Format with Hive

2014-04-05 Thread Amit Tewari
speed up using ORC format. However for some reason I can't insert data into the table with ORC format. It fails with Exception File filename could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and no node(s) are excluded in this operation

Re: Error using ORC Format with Hive

2014-04-05 Thread Prasanth Jayachandran
in hive query with Hive 0.14 (trunk version this morning). Just tried to use sample test case to start with. First wanted to see how much I can speed up using ORC format. However for some reason I can't insert data into the table with ORC format. It fails with Exception File filename could

Error using ORC format

2014-04-04 Thread Amit Tewari
Hi All, I am just trying to do some simple tests to see speedup in hive query with Hive 0.14 (trunk version this morning). Just tried to use sample test case to start with. First wanted to see how much I can speed up using ORC format. However for some reason I can't insert data

Error using ORC Format with Hive

2014-04-04 Thread Amit Tewari
Hi All, I am just trying to do some simple tests to see speedup in hive query with Hive 0.14 (trunk version this morning). Just tried to use sample test case to start with. First wanted to see how much I can speed up using ORC format. However for some reason I can't insert data

Re: Error using ORC Format with Hive

2014-04-04 Thread Amit Tewari
how much I can speed up using ORC format. However for some reason I can't insert data into the table with ORC format. It fails with Exception File filename could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and no node(s) are excluded

Re: Error using ORC Format with Hive

2014-04-04 Thread Bryan Jeffrey
in hive query with Hive 0.14 (trunk version this morning). Just tried to use sample test case to start with. First wanted to see how much I can speed up using ORC format. However for some reason I can't insert data into the table with ORC format. It fails with Exception File filename could only