RE: Impact of partitioning on certain queries

2016-01-08 Thread Mich Talebzadeh
Well that is debatable.

 

The following table sales is partitioned in Oracle but has local bitmap indexes 
that help the query.

 

select * from sales where prod_id = 10;

 

no rows selected

 

 

Execution Plan

--

Plan hash value: 511273406

 

-

| Id  | Operation  | Name   | Rows  | Bytes | 
Cost (%CPU)| Time | Pstart| Pstop |

-

|   0 | SELECT STATEMENT   ||   347 | 10063 |   
 93   (0)| 00:00:02 |   |   |

|   1 |  PARTITION RANGE ALL   ||   347 | 10063 |   
 93   (0)| 00:00:02 | 1 |28 |

|   2 |   TABLE ACCESS BY LOCAL INDEX ROWID| SALES  |   347 | 10063 |   
 93   (0)| 00:00:02 | 1 |28 |

|   3 |BITMAP CONVERSION TO ROWIDS ||   |   |   
 |  |   |   |

|*  4 | BITMAP INDEX SINGLE VALUE  | SALES_PROD_BIX |   |   |   
 |  | 1 |28 |

-

 

Obviously at this stage we do not have local indexes in Hive. That could make 
it moredefficient for search and IMO will be a great tool.

 

Cheers,

 

 

Dr Mich Talebzadeh

 

LinkedIn   

 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

Sybase ASE 15 Gold Medal Award 2008

A Winning Strategy: Running the most Critical Financial Data on ASE 15

 

 http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf

Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", 
ISBN 978-0-9563693-0-7. 

co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 
978-0-9759693-0-4

Publications due shortly:

Complex Event Processing in Heterogeneous Environments, ISBN: 978-0-9563693-3-8

Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one 
out shortly

 

  http://talebzadehmich.wordpress.com

 

NOTE: The information in this email is proprietary and confidential. This 
message is for the designated recipient only, if you are not the intended 
recipient, you should destroy it immediately. Any information in this message 
shall not be understood as given or endorsed by Peridale Technology Ltd, its 
subsidiaries or their employees, unless expressly so stated. It is the 
responsibility of the recipient to ensure that this email is virus free, 
therefore neither Peridale Ltd, its subsidiaries nor their employees accept any 
responsibility.

 

From: Jörn Franke [mailto:jornfra...@gmail.com] 
Sent: 08 January 2016 06:20
To: user@hive.apache.org
Subject: Re: Impact of partitioning on certain queries

 

This observation is correct and it is the same  behavior as you see it in other 
databases supporting partitions. Usually you should avoid many small partitions.


On 07 Jan 2016, at 23:53, Mich Talebzadeh  > wrote:

Ok we hope that partitioning improves performance where the predicate is on 
partitioned columns

 

I have two tables. One a basic table called smallsales defined as below

 

CREATE TABLE `smallsales`(  |

|   `prod_id` bigint, |

|   `cust_id` bigint, |

|   `time_id` timestamp,  |

|   `channel_id` bigint,  |

|   `promo_id` bigint,|

|   `quantity_sold` decimal(10,0),|

|   `amount_sold` decimal(10,0))  |

| ROW FORMAT SERDE|

|   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'  |

| STORED AS INPUTFORMAT   |

|   'org.apache.hadoop.mapred.TextInputFormat'|

| OUTPUTFORMAT|

|   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'  |

| LOCATION|

|   'hdfs://rhes564:9000/user/hive/warehouse/oraclehadoop.db/smallsales'  |

| TBLPROPERTIES ( |

|   'COLUMN_STATS_ACCURATE'='true', 

Re: Impact of partitioning on certain queries

2016-01-08 Thread Jörn Franke
Well you use a text format for your data so you should not be surprised. For 
text based formats, such as csv, you can always use the hive bitmap index. I do 
not think it makes a lot of sense to compare here processing csv files and 
internal tables of a relational database.

> On 08 Jan 2016, at 09:30, Mich Talebzadeh  wrote:
> 
> Well that is debatable.
>  
> The following table sales is partitioned in Oracle but has local bitmap 
> indexes that help the query.
>  
> select * from sales where prod_id = 10;
>  
> no rows selected
>  
>  
> Execution Plan
> --
> Plan hash value: 511273406
>  
> -
> | Id  | Operation  | Name   | Rows  | Bytes | 
> Cost (%CPU)| Time | Pstart| Pstop |
> -
> |   0 | SELECT STATEMENT   ||   347 | 10063 | 
>93   (0)| 00:00:02 |   |   |
> |   1 |  PARTITION RANGE ALL   ||   347 | 10063 | 
>93   (0)| 00:00:02 | 1 |28 |
> |   2 |   TABLE ACCESS BY LOCAL INDEX ROWID| SALES  |   347 | 10063 | 
>93   (0)| 00:00:02 | 1 |28 |
> |   3 |BITMAP CONVERSION TO ROWIDS ||   |   | 
>|  |   |   |
> |*  4 | BITMAP INDEX SINGLE VALUE  | SALES_PROD_BIX |   |   | 
>|  | 1 |28 |
> -
>  
> Obviously at this stage we do not have local indexes in Hive. That could make 
> it moredefficient for search and IMO will be a great tool.
>  
> Cheers,
>  
>  
> Dr Mich Talebzadeh
>  
> LinkedIn  
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> Sybase ASE 15 Gold Medal Award 2008
> A Winning Strategy: Running the most Critical Financial Data on ASE 15
> http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf
> Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", 
> ISBN 978-0-9563693-0-7.
> co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 
> 978-0-9759693-0-4
> Publications due shortly:
> Complex Event Processing in Heterogeneous Environments, ISBN: 
> 978-0-9563693-3-8
> Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume 
> one out shortly
>  
> http://talebzadehmich.wordpress.com
>  
> NOTE: The information in this email is proprietary and confidential. This 
> message is for the designated recipient only, if you are not the intended 
> recipient, you should destroy it immediately. Any information in this message 
> shall not be understood as given or endorsed by Peridale Technology Ltd, its 
> subsidiaries or their employees, unless expressly so stated. It is the 
> responsibility of the recipient to ensure that this email is virus free, 
> therefore neither Peridale Ltd, its subsidiaries nor their employees accept 
> any responsibility.
>  
> From: Jörn Franke [mailto:jornfra...@gmail.com] 
> Sent: 08 January 2016 06:20
> To: user@hive.apache.org
> Subject: Re: Impact of partitioning on certain queries
>  
> This observation is correct and it is the same  behavior as you see it in 
> other databases supporting partitions. Usually you should avoid many small 
> partitions.
> 
> On 07 Jan 2016, at 23:53, Mich Talebzadeh  wrote:
> 
> Ok we hope that partitioning improves performance where the predicate is on 
> partitioned columns
>  
> I have two tables. One a basic table called smallsales defined as below
>  
> CREATE TABLE `smallsales`(  |
> |   `prod_id` bigint, |
> |   `cust_id` bigint, |
> |   `time_id` timestamp,  |
> |   `channel_id` bigint,  |
> |   `promo_id` bigint,|
> |   `quantity_sold` decimal(10,0),|
> |   `amount_sold` decimal(10,0))  |
> | ROW FORMAT SERDE|
> |   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'  |
> | STORED AS INPUTFORMAT   |
> |   'org.apache.hadoop.mapred.TextInputFormat'|
> | OUTPUTFORMAT|
> |   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'  |
> | LOCATION  

RE: Impact of partitioning on certain queries

2016-01-08 Thread Mich Talebzadeh
Interesting point below:

 

Well you use a text format for your data so you should not be surprised. For 
text based formats, such as csv, you can always use the hive bitmap index.

 

 

How can one create a bitmap index in Hive please?

 

 

Dr Mich Talebzadeh

 

LinkedIn   

 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

Sybase ASE 15 Gold Medal Award 2008

A Winning Strategy: Running the most Critical Financial Data on ASE 15

 

 http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf

Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", 
ISBN 978-0-9563693-0-7. 

co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 
978-0-9759693-0-4

Publications due shortly:

Complex Event Processing in Heterogeneous Environments, ISBN: 978-0-9563693-3-8

Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one 
out shortly

 

  http://talebzadehmich.wordpress.com

 

NOTE: The information in this email is proprietary and confidential. This 
message is for the designated recipient only, if you are not the intended 
recipient, you should destroy it immediately. Any information in this message 
shall not be understood as given or endorsed by Peridale Technology Ltd, its 
subsidiaries or their employees, unless expressly so stated. It is the 
responsibility of the recipient to ensure that this email is virus free, 
therefore neither Peridale Ltd, its subsidiaries nor their employees accept any 
responsibility.

 

From: Jörn Franke [mailto:jornfra...@gmail.com] 
Sent: 08 January 2016 08:49
To: user@hive.apache.org
Subject: Re: Impact of partitioning on certain queries

 

Well you use a text format for your data so you should not be surprised. For 
text based formats, such as csv, you can always use the hive bitmap index. I do 
not think it makes a lot of sense to compare here processing csv files and 
internal tables of a relational database.


On 08 Jan 2016, at 09:30, Mich Talebzadeh  > wrote:

Well that is debatable.

 

The following table sales is partitioned in Oracle but has local bitmap indexes 
that help the query.

 

select * from sales where prod_id = 10;

 

no rows selected

 

 

Execution Plan

--

Plan hash value: 511273406

 

-

| Id  | Operation  | Name   | Rows  | Bytes | 
Cost (%CPU)| Time | Pstart| Pstop |

-

|   0 | SELECT STATEMENT   ||   347 | 10063 |   
 93   (0)| 00:00:02 |   |   |

|   1 |  PARTITION RANGE ALL   ||   347 | 10063 |   
 93   (0)| 00:00:02 | 1 |28 |

|   2 |   TABLE ACCESS BY LOCAL INDEX ROWID| SALES  |   347 | 10063 |   
 93   (0)| 00:00:02 | 1 |28 |

|   3 |BITMAP CONVERSION TO ROWIDS ||   |   |   
 |  |   |   |

|*  4 | BITMAP INDEX SINGLE VALUE  | SALES_PROD_BIX |   |   |   
 |  | 1 |28 |

-

 

Obviously at this stage we do not have local indexes in Hive. That could make 
it moredefficient for search and IMO will be a great tool.

 

Cheers,

 

 

Dr Mich Talebzadeh

 

LinkedIn   

 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

Sybase ASE 15 Gold Medal Award 2008

A Winning Strategy: Running the most Critical Financial Data on ASE 15

 

 http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf

Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", 
ISBN 978-0-9563693-0-7. 

co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 
978-0-9759693-0-4

Publications due shortly:

Complex Event Processing in Heterogeneous Environments, ISBN: 978-0-9563693-3-8

Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one 
out shortly

 

  http://talebzadehmich.wordpress.com

 

NOTE: The information in this email is proprietary and confidential. This 
message is for the designated recipient only, if you are not the intended 
recipient, you should destroy it immediately. Any 

Hive UDF accessing https request

2016-01-08 Thread Prabhu Joseph
Hi Experts,

   I am trying to write a Hive UDF which access https request and based on
the response return the result. From Plain Java, the https response is
coming but the https accessed from UDF is null.

Can anyone review the below and share the correct steps to do this.


create temporary function profoundIP as 'com.network.logs.udf.ProfoundIp';

select ip,profoundIP(ip) as info from r_distinct_ips_temp;
 //returns NULL


//Below UDF program

package com.network.logs.udf;

import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.URL;

import javax.net.ssl.HttpsURLConnection;

import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;

public class ProfoundNew extends UDF {

private Text evaluate(Text input) {

String url = "https://api2.profound.net/ip/; + input.toString()
+"?view=enterprise";

URL obj;
try {
obj = new URL(url);

HttpsURLConnection con = (HttpsURLConnection) obj.openConnection();

con.setRequestMethod("GET");
con.setRequestProperty("Authorization","ProfoundAuth
apikey=cisco-065ccfec619011e38f");

int responseCode = con.getResponseCode();

BufferedReader in = new BufferedReader(new
InputStreamReader(con.getInputStream()));
String inputLine;
StringBuffer response = new StringBuffer();

while ((inputLine = in.readLine()) != null) {
response.append(inputLine);
}
in.close();
return new Text(response.toString());
} catch (Exception e) {
e.printStackTrace();
}
return null;

}
}



Thanks,
Prabhu Joseph


Re: Hive UDF accessing https request

2016-01-08 Thread Sergey Shelukhin
To start with, you can remove the try-catch so that the exception is not 
swallowed and you can see if an error occurs.
However, note that this is an anti-pattern for any reasonable-sized dataset.

From: Prabhu Joseph 
>
Reply-To: "user@hive.apache.org" 
>
Date: Friday, January 8, 2016 at 00:51
To: "user@hive.apache.org" 
>, 
"d...@hive.apache.org" 
>
Subject: Hive UDF accessing https request

Hi Experts,

   I am trying to write a Hive UDF which access https request and based on the 
response return the result. From Plain Java, the https response is coming but 
the https accessed from UDF is null.

Can anyone review the below and share the correct steps to do this.


create temporary function profoundIP as 'com.network.logs.udf.ProfoundIp';

select ip,profoundIP(ip) as info from r_distinct_ips_temp;
 //returns NULL


//Below UDF program

package com.network.logs.udf;

import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.URL;

import javax.net.ssl.HttpsURLConnection;

import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;

public class ProfoundNew extends UDF {

private Text evaluate(Text input) {

String url = "https://api2.profound.net/ip/; + input.toString() 
+"?view=enterprise";

URL obj;
try {
obj = new URL(url);

HttpsURLConnection con = (HttpsURLConnection) obj.openConnection();

con.setRequestMethod("GET");
con.setRequestProperty("Authorization","ProfoundAuth 
apikey=cisco-065ccfec619011e38f");

int responseCode = con.getResponseCode();

BufferedReader in = new BufferedReader(new 
InputStreamReader(con.getInputStream()));
String inputLine;
StringBuffer response = new StringBuffer();

while ((inputLine = in.readLine()) != null) {
response.append(inputLine);
}
in.close();
return new Text(response.toString());
} catch (Exception e) {
e.printStackTrace();
}
return null;

}
}



Thanks,
Prabhu Joseph



Re: adding jars - hive on spark cdh 5.4.3

2016-01-08 Thread Ophir Etzion
It didn't work. assuming I did the right thing.
in the properties  you could see

{"key":"hive.aux.jars.path","value":"file:///data/loko/foursquare.web-hiverc/current/hadoop-hive-serde.jar,file:///data/loko/foursquare.web-hiverc/current/hadoop-hive-udf.jar","isFinal":false,"resource":"programatically"}
which includes the jar that has the class I need but I still get

org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to
find class: com.foursquare.hadoop.hive.io.HiveThriftSequenceFileInputFormat



On Fri, Jan 8, 2016 at 12:24 PM, Edward Capriolo 
wrote:

> You can not 'add jar' input formats and serde's. They need to be part of
> your auxlib.
>
> On Fri, Jan 8, 2016 at 12:19 PM, Ophir Etzion 
> wrote:
>
>> I tried now. still getting
>>
>> 16/01/08 16:37:34 ERROR exec.Utilities: Failed to load plan: 
>> hdfs://hadoop-alidoro-nn-vip/tmp/hive/hive/c2af9882-38a9-42b0-8d17-3f56708383e8/hive_2016-01-08_16-36-41_370_3307331506800215903-3/-mr-10004/3c90a796-47fc-4541-bbec-b196c40aefab/map.xml:
>>  org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find 
>> class: com.foursquare.hadoop.hive.io.HiveThriftSequenceFileInputFormat
>> Serialization trace:
>> inputFileFormatClass (org.apache.hadoop.hive.ql.plan.PartitionDesc)
>> aliasToPartnInfo (org.apache.hadoop.hive.ql.plan.MapWork)
>> org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find 
>> class: com.foursquare.hadoop.hive.io.HiveThriftSequenceFileInputFormat
>>
>>
>> HiveThriftSequenceFileInputFormat is in one of the jars I'm trying to add.
>>
>>
>> On Thu, Jan 7, 2016 at 9:58 PM, Prem Sure  wrote:
>>
>>> did you try -- jars property in spark submit? if your jar is of huge
>>> size, you can pre-load the jar on all executors in a common available
>>> directory to avoid network IO.
>>>
>>> On Thu, Jan 7, 2016 at 4:03 PM, Ophir Etzion 
>>> wrote:
>>>
 I' trying to add jars before running a query using hive on spark on cdh
 5.4.3.
 I've tried applying the patch in
 https://issues.apache.org/jira/browse/HIVE-12045 (manually as the
 patch is done on a different hive version) but still hasn't succeeded.

 did anyone manage to do ADD JAR successfully with CDH?

 Thanks,
 Ophir

>>>
>>>
>>
>


bitmap index on FACT table

2016-01-08 Thread Mich Talebzadeh
Hi,

 

I have the usual SALES fact table with 5 million rows partitioned by yean
and month

 

I created 5 bitmap indexes all being the foreign keys from DIMENSION tables
as below:

 

0: jdbc:hive2://rhes564:10010/default> show index on sales;

+---+---+---+---
---+---+--+-
-+

|   idx_name|   tab_name|   col_names   |
idx_tab_name   |   idx_type| comment  |

+---+---+---+---
---+---+--+-
-+

| sales_cust_bix| sales | cust_id   |
oraclehadoop__sales_sales_cust_bix__ | bitmap|
|

| sales_channel_bix | sales | channel_id|
oraclehadoop__sales_sales_channel_bix__  | bitmap|
|

| sales_prod_bix| sales | prod_id   |
oraclehadoop__sales_sales_prod_bix__ | bitmap|
|

| sales_promo_bix   | sales | promo_id  |
oraclehadoop__sales_sales_promo_bix__| bitmap|
|

| sales_time_bix| sales | time_id   |
oraclehadoop__sales_sales_time_bix__ | bitmap|
|

+---+---+---+---
---+---+--+-
-+

 

Now I would like to see the usage of bitmap index when I do something simple
as below

 

0: jdbc:hive2://rhes564:10010/default> explain dependency select prod_id,
count(prod_id)

0: jdbc:hive2://rhes564:10010/default> from sales

0: jdbc:hive2://rhes564:10010/default> group by prod_id;

+---





































---+--+

|
Explain
|

+---


Re: How to increase the mapper number in my case

2016-01-08 Thread Ankit Bhatnagar
check these> mapred.max.split.size 
> mapred.min.split.size  

On Friday, January 8, 2016 6:41 PM, Todd  wrote:
 

 Hi,

I have Hadoop (2.6.0, pseudo distributed mode) and Hive (1.2.1) installed on my 
local machine. I have a table A,its underlying file takes up 8 HDFS blocks.
When I run a query like
select count(1) from A

>From the result, I see only 1 mapper task ,I thought it should be equal to the 
>block numbers.

How could I configure to make the mapper number to be equal to the block 
numbers? 

Thanks!


   

How to increase the mapper number in my case

2016-01-08 Thread Todd
Hi,

I have Hadoop (2.6.0, pseudo distributed mode) and Hive (1.2.1) installed on my 
local machine. I have a table A,its underlying file takes up 8 HDFS blocks.
When I run a query like
select count(1) from A

From the result, I see only 1 mapper task ,I thought it should be equal to the 
block numbers.

How could I configure to make the mapper number to be equal to the block 
numbers?

Thanks!


Does hive(1.2.1) support to automatically detect parquet schema?

2016-01-08 Thread Todd
Hi,

I would ask whether hive(1.2.1) support to  automatically detect parquet schema.
Thanks.


RE: Impact of partitioning on certain queries

2016-01-08 Thread Mich Talebzadeh
Thanks Gopal.

 

Basically the following is true:

 

1.The storage layer is HDFS

2.The execution engine is MR, Tez, Spark etc

3.The access layer is Hive

 

When we say the access layer is Hive, is the assumption correct that we are
referring to optimiser (loosly related to the optimiser in RDBMS). For
example is Hive optimiser aware of the number of underlying partitions. The
reason I am asking this question is that with EXPLAIN I only see Table scan
and it does refer to any partition or partition elimination?

 

 

Cheers

 

 

NOTE: The information in this email is proprietary and confidential. This
message is for the designated recipient only, if you are not the intended
recipient, you should destroy it immediately. Any information in this
message shall not be understood as given or endorsed by Peridale Technology
Ltd, its subsidiaries or their employees, unless expressly so stated. It is
the responsibility of the recipient to ensure that this email is virus free,
therefore neither Peridale Ltd, its subsidiaries nor their employees accept
any responsibility.

 

 

-Original Message-
From: Gopal Vijayaraghavan [mailto:go...@hortonworks.com] On Behalf Of Gopal
Vijayaraghavan
Sent: 08 January 2016 09:34
To: user@hive.apache.org
Subject: Re: Impact of partitioning on certain queries

 

 

> Ok we hope that partitioning improves performance where the predicate 

>is on partitioned columns

 

Nope.

 

Partitioning *only* improves performance if your queries run with

 

set hive.mapred.mode=strict;

 

That's the "use strict" easy way to make sure you're writing good queries.

 

Even then, schema design in hive is something you need to learn with the
assumption that neither the storage layer, nor the compute layer is part of
"hive".

 

It floats itself in an "access" layer above both. Not sure there's any
legacy tech to draw parallels with that.

 

If you haven't seen this before, here's an example of the problem

 

 

http://www.slideshare.net/Hadoop_Summit/hive-at-yahoo-letters-from-the-tren

ches/24

 

 

Cheers,

Gopal



Re: Impact of partitioning on certain queries

2016-01-08 Thread Jörn Franke
https://snippetessay.wordpress.com/2015/07/25/hive-optimizations-with-indexes-bloom-filters-and-statistics/
Maybe a compact index makes more sense if you have high cardinality columns

> On 08 Jan 2016, at 10:11, Mich Talebzadeh  wrote:
> 
> Interesting point below:
>  
> Well you use a text format for your data so you should not be surprised. For 
> text based formats, such as csv, you can always use the hive bitmap index.
>  
>  
> How can one create a bitmap index in Hive please?
>  
>  
> Dr Mich Talebzadeh
>  
> LinkedIn  
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> Sybase ASE 15 Gold Medal Award 2008
> A Winning Strategy: Running the most Critical Financial Data on ASE 15
> http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf
> Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", 
> ISBN 978-0-9563693-0-7.
> co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 
> 978-0-9759693-0-4
> Publications due shortly:
> Complex Event Processing in Heterogeneous Environments, ISBN: 
> 978-0-9563693-3-8
> Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume 
> one out shortly
>  
> http://talebzadehmich.wordpress.com
>  
> NOTE: The information in this email is proprietary and confidential. This 
> message is for the designated recipient only, if you are not the intended 
> recipient, you should destroy it immediately. Any information in this message 
> shall not be understood as given or endorsed by Peridale Technology Ltd, its 
> subsidiaries or their employees, unless expressly so stated. It is the 
> responsibility of the recipient to ensure that this email is virus free, 
> therefore neither Peridale Ltd, its subsidiaries nor their employees accept 
> any responsibility.
>  
> From: Jörn Franke [mailto:jornfra...@gmail.com] 
> Sent: 08 January 2016 08:49
> To: user@hive.apache.org
> Subject: Re: Impact of partitioning on certain queries
>  
> Well you use a text format for your data so you should not be surprised. For 
> text based formats, such as csv, you can always use the hive bitmap index. I 
> do not think it makes a lot of sense to compare here processing csv files and 
> internal tables of a relational database.
> 
> On 08 Jan 2016, at 09:30, Mich Talebzadeh  wrote:
> 
> Well that is debatable.
>  
> The following table sales is partitioned in Oracle but has local bitmap 
> indexes that help the query.
>  
> select * from sales where prod_id = 10;
>  
> no rows selected
>  
>  
> Execution Plan
> --
> Plan hash value: 511273406
>  
> -
> | Id  | Operation  | Name   | Rows  | Bytes | 
> Cost (%CPU)| Time | Pstart| Pstop |
> -
> |   0 | SELECT STATEMENT   ||   347 | 10063 | 
>93   (0)| 00:00:02 |   |   |
> |   1 |  PARTITION RANGE ALL   ||   347 | 10063 | 
>93   (0)| 00:00:02 | 1 |28 |
> |   2 |   TABLE ACCESS BY LOCAL INDEX ROWID| SALES  |   347 | 10063 | 
>93   (0)| 00:00:02 | 1 |28 |
> |   3 |BITMAP CONVERSION TO ROWIDS ||   |   | 
>|  |   |   |
> |*  4 | BITMAP INDEX SINGLE VALUE  | SALES_PROD_BIX |   |   | 
>|  | 1 |28 |
> -
>  
> Obviously at this stage we do not have local indexes in Hive. That could make 
> it moredefficient for search and IMO will be a great tool.
>  
> Cheers,
>  
>  
> Dr Mich Talebzadeh
>  
> LinkedIn  
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> Sybase ASE 15 Gold Medal Award 2008
> A Winning Strategy: Running the most Critical Financial Data on ASE 15
> http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf
> Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", 
> ISBN 978-0-9563693-0-7.
> co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 
> 978-0-9759693-0-4
> Publications due shortly:
> Complex Event Processing in Heterogeneous Environments, ISBN: 
> 978-0-9563693-3-8
> Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume 
> one out shortly
>  
> http://talebzadehmich.wordpress.com
>  
> NOTE: The information in this email is proprietary and confidential. This 
> message is for the designated recipient only, if you are not the intended 
> recipient, you should destroy it immediately. Any information in this message 
> shall not be 

Re: adding jars - hive on spark cdh 5.4.3

2016-01-08 Thread Ophir Etzion
Thanks!
In certain use cases you could but forgot about the aux thing, thats
probably it.

On Fri, Jan 8, 2016 at 12:24 PM, Edward Capriolo 
wrote:

> You can not 'add jar' input formats and serde's. They need to be part of
> your auxlib.
>
> On Fri, Jan 8, 2016 at 12:19 PM, Ophir Etzion 
> wrote:
>
>> I tried now. still getting
>>
>> 16/01/08 16:37:34 ERROR exec.Utilities: Failed to load plan: 
>> hdfs://hadoop-alidoro-nn-vip/tmp/hive/hive/c2af9882-38a9-42b0-8d17-3f56708383e8/hive_2016-01-08_16-36-41_370_3307331506800215903-3/-mr-10004/3c90a796-47fc-4541-bbec-b196c40aefab/map.xml:
>>  org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find 
>> class: com.foursquare.hadoop.hive.io.HiveThriftSequenceFileInputFormat
>> Serialization trace:
>> inputFileFormatClass (org.apache.hadoop.hive.ql.plan.PartitionDesc)
>> aliasToPartnInfo (org.apache.hadoop.hive.ql.plan.MapWork)
>> org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find 
>> class: com.foursquare.hadoop.hive.io.HiveThriftSequenceFileInputFormat
>>
>>
>> HiveThriftSequenceFileInputFormat is in one of the jars I'm trying to add.
>>
>>
>> On Thu, Jan 7, 2016 at 9:58 PM, Prem Sure  wrote:
>>
>>> did you try -- jars property in spark submit? if your jar is of huge
>>> size, you can pre-load the jar on all executors in a common available
>>> directory to avoid network IO.
>>>
>>> On Thu, Jan 7, 2016 at 4:03 PM, Ophir Etzion 
>>> wrote:
>>>
 I' trying to add jars before running a query using hive on spark on cdh
 5.4.3.
 I've tried applying the patch in
 https://issues.apache.org/jira/browse/HIVE-12045 (manually as the
 patch is done on a different hive version) but still hasn't succeeded.

 did anyone manage to do ADD JAR successfully with CDH?

 Thanks,
 Ophir

>>>
>>>
>>
>


Re: adding jars - hive on spark cdh 5.4.3

2016-01-08 Thread Ophir Etzion
I tried now. still getting

16/01/08 16:37:34 ERROR exec.Utilities: Failed to load plan:
hdfs://hadoop-alidoro-nn-vip/tmp/hive/hive/c2af9882-38a9-42b0-8d17-3f56708383e8/hive_2016-01-08_16-36-41_370_3307331506800215903-3/-mr-10004/3c90a796-47fc-4541-bbec-b196c40aefab/map.xml:
org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to
find class: com.foursquare.hadoop.hive.io.HiveThriftSequenceFileInputFormat
Serialization trace:
inputFileFormatClass (org.apache.hadoop.hive.ql.plan.PartitionDesc)
aliasToPartnInfo (org.apache.hadoop.hive.ql.plan.MapWork)
org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to
find class: com.foursquare.hadoop.hive.io.HiveThriftSequenceFileInputFormat


HiveThriftSequenceFileInputFormat is in one of the jars I'm trying to add.


On Thu, Jan 7, 2016 at 9:58 PM, Prem Sure  wrote:

> did you try -- jars property in spark submit? if your jar is of huge size,
> you can pre-load the jar on all executors in a common available directory
> to avoid network IO.
>
> On Thu, Jan 7, 2016 at 4:03 PM, Ophir Etzion  wrote:
>
>> I' trying to add jars before running a query using hive on spark on cdh
>> 5.4.3.
>> I've tried applying the patch in
>> https://issues.apache.org/jira/browse/HIVE-12045 (manually as the patch
>> is done on a different hive version) but still hasn't succeeded.
>>
>> did anyone manage to do ADD JAR successfully with CDH?
>>
>> Thanks,
>> Ophir
>>
>
>


Re: adding jars - hive on spark cdh 5.4.3

2016-01-08 Thread Edward Capriolo
You can not 'add jar' input formats and serde's. They need to be part of
your auxlib.

On Fri, Jan 8, 2016 at 12:19 PM, Ophir Etzion  wrote:

> I tried now. still getting
>
> 16/01/08 16:37:34 ERROR exec.Utilities: Failed to load plan: 
> hdfs://hadoop-alidoro-nn-vip/tmp/hive/hive/c2af9882-38a9-42b0-8d17-3f56708383e8/hive_2016-01-08_16-36-41_370_3307331506800215903-3/-mr-10004/3c90a796-47fc-4541-bbec-b196c40aefab/map.xml:
>  org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find 
> class: com.foursquare.hadoop.hive.io.HiveThriftSequenceFileInputFormat
> Serialization trace:
> inputFileFormatClass (org.apache.hadoop.hive.ql.plan.PartitionDesc)
> aliasToPartnInfo (org.apache.hadoop.hive.ql.plan.MapWork)
> org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find 
> class: com.foursquare.hadoop.hive.io.HiveThriftSequenceFileInputFormat
>
>
> HiveThriftSequenceFileInputFormat is in one of the jars I'm trying to add.
>
>
> On Thu, Jan 7, 2016 at 9:58 PM, Prem Sure  wrote:
>
>> did you try -- jars property in spark submit? if your jar is of huge
>> size, you can pre-load the jar on all executors in a common available
>> directory to avoid network IO.
>>
>> On Thu, Jan 7, 2016 at 4:03 PM, Ophir Etzion 
>> wrote:
>>
>>> I' trying to add jars before running a query using hive on spark on cdh
>>> 5.4.3.
>>> I've tried applying the patch in
>>> https://issues.apache.org/jira/browse/HIVE-12045 (manually as the patch
>>> is done on a different hive version) but still hasn't succeeded.
>>>
>>> did anyone manage to do ADD JAR successfully with CDH?
>>>
>>> Thanks,
>>> Ophir
>>>
>>
>>
>


Re: adding jars - hive on spark cdh 5.4.3

2016-01-08 Thread Edward Capriolo
Yes you can add UDF's via add Jar. But strangely the classpath of  'the
driver' of the hive process does not seem to be able to utilize
InputFormats and Serde's that have been added to the session via ADD JAR.
At one point I understood why. This is probably something we should ticket
and come up with a more elegant solution.

On Fri, Jan 8, 2016 at 12:26 PM, Ophir Etzion  wrote:

> Thanks!
> In certain use cases you could but forgot about the aux thing, thats
> probably it.
>
> On Fri, Jan 8, 2016 at 12:24 PM, Edward Capriolo 
> wrote:
>
>> You can not 'add jar' input formats and serde's. They need to be part of
>> your auxlib.
>>
>> On Fri, Jan 8, 2016 at 12:19 PM, Ophir Etzion 
>> wrote:
>>
>>> I tried now. still getting
>>>
>>> 16/01/08 16:37:34 ERROR exec.Utilities: Failed to load plan: 
>>> hdfs://hadoop-alidoro-nn-vip/tmp/hive/hive/c2af9882-38a9-42b0-8d17-3f56708383e8/hive_2016-01-08_16-36-41_370_3307331506800215903-3/-mr-10004/3c90a796-47fc-4541-bbec-b196c40aefab/map.xml:
>>>  org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find 
>>> class: com.foursquare.hadoop.hive.io.HiveThriftSequenceFileInputFormat
>>> Serialization trace:
>>> inputFileFormatClass (org.apache.hadoop.hive.ql.plan.PartitionDesc)
>>> aliasToPartnInfo (org.apache.hadoop.hive.ql.plan.MapWork)
>>> org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find 
>>> class: com.foursquare.hadoop.hive.io.HiveThriftSequenceFileInputFormat
>>>
>>>
>>> HiveThriftSequenceFileInputFormat is in one of the jars I'm trying to add.
>>>
>>>
>>> On Thu, Jan 7, 2016 at 9:58 PM, Prem Sure  wrote:
>>>
 did you try -- jars property in spark submit? if your jar is of huge
 size, you can pre-load the jar on all executors in a common available
 directory to avoid network IO.

 On Thu, Jan 7, 2016 at 4:03 PM, Ophir Etzion 
 wrote:

> I' trying to add jars before running a query using hive on spark on
> cdh 5.4.3.
> I've tried applying the patch in
> https://issues.apache.org/jira/browse/HIVE-12045 (manually as the
> patch is done on a different hive version) but still hasn't succeeded.
>
> did anyone manage to do ADD JAR successfully with CDH?
>
> Thanks,
> Ophir
>


>>>
>>
>