unsubscribe

2016-08-01 Thread zhang jp
unsubscribe


Re: How to pass the TestHplsqlDb test in hive-hplsql?

2016-08-01 Thread Zhenyi Zhao
Thank you Dmitry, I found it.

If I want integrate  my database (like hive) with  hplsql, is there any
test case suite  to estimate its compatibility ?

Emerson

2016-08-01 20:18 GMT+08:00 Dmitry Tolpeko :

> Please try to find them in ./ql/src/test/queries/clientpositive directory
> (see topn.q file for example).
>
> Thanks,
> Dmitry
>
> On Mon, Aug 1, 2016 at 11:34 AM, Zhenyi Zhao  wrote:
>
>> Hi Dmitry,
>>
>> Thank you for your answer.  you said “*src* and *sample_07* are
>> sample tables supplied with Hive” , where can I find this infomation of
>> these tables.
>>
>> Emerson
>>
>> 2016-08-01 16:27 GMT+08:00 Dmitry Tolpeko :
>>
>>> Hi Emerson,
>>>
>>> I did not commit TestHplsqlDb.java since Apache Pre-commit test starts
>>> executing it, and I did not manage how to pass it (there are connection
>>> errors). I can commit it as java_ to prevent execution, or someone can help
>>> with connection errors.
>>>
>>> Some table DDLs are here:
>>> https://github.com/apache/hive/blob/master/hplsql/src/test/queries/db/schema.sql
>>>
>>> *src* and *sample_07* are sample tables supplied with Hive. By the way,
>>> *src* table is used in many Hive tests.
>>>
>>> Thanks,
>>> Dmitry
>>>
>>>
>>>
>>> On Mon, Aug 1, 2016 at 6:48 AM, Zhenyi Zhao  wrote:
>>>
 Hi all,

 There is a unit test class named "TestHplsqlDb.java"  I  found at
 http://www.hplsql.org/downloads/hplsql-0.3.17-src.zip. But I want to
 know why this class dose't exsit at
 https://github.com/apache/hive/tree/master/hplsql.

 Now I want to test the hplsql‘s compatibility with hive datasource, but
 the tests failed.

 And my question is how to pass the "TestHplsqlDb.java"  test , I found
 there are some tables required by the tests , like  src , sample_07, src_dt
  and so on. But I could not find them to pass the test. How to init the
 test env?

 I' am waiting for your response. Thank you very much~

 Emerson

>>>
>>>
>>
>


Re: Doubt on Hive Partitioning.

2016-08-01 Thread Qiuzhuang Lian
Is this partition pruning fixed in MR too except for TEZ in newer hive
version?

Regards,
Q

On Mon, Aug 1, 2016 at 8:48 PM, Jörn Franke  wrote:

> It happens in old hive version of the filter is only in the where clause
> and NOT in the join clause. This should not happen in newer hive version.
> You can check it by executing explain dependency query.
>
> On 01 Aug 2016, at 11:07, Abhishek Dubey  > wrote:
>
> Hi All,
>
>
>
> I have a very big table *t* with billions of rows and it is partitioned
> on a column *p*. Column *p * has datatype text and values like ‘201601’,
> ‘201602’…upto ‘201612’.
>
> And, I am running a query like : *Select columns from t where p=’201604’.*
>
>
>
> My question is : Can there be a scenario/condition/probability that my
> query will do a complete table scan on *t* instead of only reading data
> for specified partition key. If yes, please put some light on those
> scenario.
>
>
>
> I’m asking this because someone told me that there is a probability that
> the query will ignore the partitioning and do a complete table scan to
> fetch output.
>
>
>
> *Thanks & Regards,*
> *Abhishek Dubey*
>
>
>
>


Re: Hive transactional table with delta files, Spark cannot read and sends error

2016-08-01 Thread Gopal Vijayaraghavan

> I am on Spark 1.6.1 and getting the following error

Ah, I realize that it's yet to be released officially.

Here's the demo from HadoopSummit -



I doubt this will ever be available for older spark releases but will be a
datasource package like spark-redshift.

Cheers,
Gopal








Re: How can I force Hive to start compaction on a table immediately

2016-08-01 Thread Mich Talebzadeh
Thanks Alan.

One crude solution would be to copy data from the ACID table to a simple
table and present that table to Spark to see the data.

This is basically Spark optimiser issue not the engine itself

My Hive runs on Spark query engine and all works fine there.

HTH

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 1 August 2016 at 23:47, Alan Gates  wrote:

> There’s no way to force immediate compaction.  If there are compaction
> workers in the metastore that aren’t busy they should pick that up
> immediately.  But there isn’t an ability to create a worker thread and
> start compacting.
>
> Alan.
>
> > On Aug 1, 2016, at 14:50, Mich Talebzadeh 
> wrote:
> >
> >
> > Rather than queuing it
> >
> > hive> alter table payees COMPACT 'major';
> > Compaction enqueued.
> > OK
> >
> > Thanks
> >
> > Dr Mich Talebzadeh
> >
> > LinkedIn
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> >
> > http://talebzadehmich.wordpress.com
> >
> > Disclaimer: Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
> >
>
>


Re: How can I force Hive to start compaction on a table immediately

2016-08-01 Thread Alan Gates
There’s no way to force immediate compaction.  If there are compaction workers 
in the metastore that aren’t busy they should pick that up immediately.  But 
there isn’t an ability to create a worker thread and start compacting.

Alan.

> On Aug 1, 2016, at 14:50, Mich Talebzadeh  wrote:
> 
> 
> Rather than queuing it
> 
> hive> alter table payees COMPACT 'major';
> Compaction enqueued.
> OK
> 
> Thanks
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> http://talebzadehmich.wordpress.com
> 
> Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
> damage or destruction of data or any other property which may arise from 
> relying on this email's technical content is explicitly disclaimed. The 
> author will in no case be liable for any monetary damages arising from such 
> loss, damage or destruction.
>  



Re: Hive transactional table with delta files, Spark cannot read and sends error

2016-08-01 Thread Mich Talebzadeh
Thanks Gopal.

I am on Spark 1.6.1 and getting the following error

scala> var conn = LlapContext.newInstance(sc, hs2_url);
:28: error: not found: value LlapContext
 var conn = LlapContext.newInstance(sc, hs2_url);



Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 1 August 2016 at 22:53, Gopal Vijayaraghavan  wrote:

>
>
> > Spark fails reading this table. What options do I have here?
>
> Would your issue be the same as
> https://issues.apache.org/jira/browse/SPARK-13129?
>
>
> LLAPContext in Spark can read those tables with ACID semantics (as in
> delete/updates will work right).
>
> var conn = LlapContext.newInstance(sc, hs2_url);
> var df: DataFrame = conn.sql("select * from payees").persist();
>
> Please be aware that's entirely in auto-commit mode, so you will be
> getting lazy snapshot isolation (hence, persist is a good idea).
>
> Even though "payees" is a placeholder, but this approach is intended for
> tables like that which have multiple consumers, the practical reason to
> use this pathway would be to apply specific masking/filtering by accessing
> user (like hide amounts or just fit amounts into ranges, like 0-99, 99-999
> etc instead of actual values for compliance audits without creating
> complete copies).
>
> Cheers,
> Gopal
>
>
>


Re: Hive transactional table with delta files, Spark cannot read and sends error

2016-08-01 Thread Gopal Vijayaraghavan


> Spark fails reading this table. What options do I have here?

Would your issue be the same as
https://issues.apache.org/jira/browse/SPARK-13129?


LLAPContext in Spark can read those tables with ACID semantics (as in
delete/updates will work right).

var conn = LlapContext.newInstance(sc, hs2_url);
var df: DataFrame = conn.sql("select * from payees").persist();

Please be aware that's entirely in auto-commit mode, so you will be
getting lazy snapshot isolation (hence, persist is a good idea).

Even though "payees" is a placeholder, but this approach is intended for
tables like that which have multiple consumers, the practical reason to
use this pathway would be to apply specific masking/filtering by accessing
user (like hide amounts or just fit amounts into ranges, like 0-99, 99-999
etc instead of actual values for compliance audits without creating
complete copies).

Cheers,
Gopal




How can I force Hive to start compaction on a table immediately

2016-08-01 Thread Mich Talebzadeh
Rather than queuing it

hive> alter table payees COMPACT 'major';
Compaction enqueued.
OK

Thanks

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.


Hive transactional table with delta files, Spark cannot read and sends error

2016-08-01 Thread Mich Talebzadeh
Hi,

This is an ORC transactional table

Hive 2, Spark 1.6.1

hive> show create table payees;
OK
CREATE TABLE `payees`(
  `transactiondescription` string,
  `hits` int,
  `hashtag` string)
CLUSTERED BY (
  transactiondescription)
INTO 256 BUCKETS
ROW FORMAT SERDE
  'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
  'hdfs://rhes564:9000/user/hive/warehouse/accounts.db/payees'
TBLPROPERTIES (
  'COLUMN_STATS_ACCURATE'='{\"BASIC_STATS\":\"true\"}',
  'numFiles'='887',
  'numRows'='620',
  'orc.compress'='ZLIB',
  'rawDataSize'='0',
  'totalSize'='844236',
  'transactional'='true',
  'transient_lastDdlTime'='1470085025')


With a number of delta files due to updates

drwxr-xr-x   - hduser supergroup  0 2016-07-31 17:50
/user/hive/warehouse/accounts.db/payees/.hive-staging_hive_2016-07-31_17-50-32_170_8812909383478460781-1
drwxr-xr-x   - hduser supergroup  0 2016-07-31 17:52
/user/hive/warehouse/accounts.db/payees/.hive-staging_hive_2016-07-31_17-52-07_293_1488464319826411209-1
drwxr-xr-x   - hduser supergroup  0 2016-07-31 17:53
/user/hive/warehouse/accounts.db/payees/.hive-staging_hive_2016-07-31_17-53-50_895_2719298331883820815-1
drwxr-xr-x   - hduser supergroup  0 2016-07-31 17:55
/user/hive/warehouse/accounts.db/payees/.hive-staging_hive_2016-07-31_17-55-05_534_319547078132526-1
drwxr-xr-x   - hduser supergroup  0 2016-07-31 20:26
/user/hive/warehouse/accounts.db/payees/.hive-staging_hive_2016-07-31_20-25-47_734_1633377491601926952-1
drwxr-xr-x   - hduser supergroup  0 2016-07-29 21:01
/user/hive/warehouse/accounts.db/payees/delta_099_099_
drwxr-xr-x   - hduser supergroup  0 2016-07-29 21:02
/user/hive/warehouse/accounts.db/payees/delta_100_100_
drwxr-xr-x   - hduser supergroup  0 2016-07-29 21:20
/user/hive/warehouse/accounts.db/payees/delta_101_101_

Spark fails reading this table. What options do I have here?


And interestingly Hive running on Spark engine and its works fine


Thanks


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.


Re: Doubt on Hive Partitioning.

2016-08-01 Thread Gopal Vijayaraghavan


> WHERE p IN (SELECT p FROM t2)


> here we could argue that Hive could optimize this by computing the sub
>query first, 
> and then do the partition pruning, but sadly I don't think this
>optimisation has been implemented yet

It is implemented already -


In Hive-1.x, the optimization doesn't kick in when the partition column
has a UDF wrapped around it.

In Hive-2.0, it does apply even if the partition column is wrapped with a
UDF.

"explain rewrite  where p IN (Select p from t2);"

will show the rewrite which enables DPP.

> An example of non-deterministic function are rand() and unix_timestamp()
>because it is evaluated differently at each row

Yes, that is exactly right. Another case was TO_DATE() which in Hive-1.x
returned Strings and prevented the removal of partitions.

Cheers,
Gopal





Re: Doubt on Hive Partitioning.

2016-08-01 Thread Jörn Franke
It happens in old hive version of the filter is only in the where clause and 
NOT in the join clause. This should not happen in newer hive version. You can 
check it by executing explain dependency query. 

> On 01 Aug 2016, at 11:07, Abhishek Dubey  wrote:
> 
> Hi All,
>  
> I have a very big table t with billions of rows and it is partitioned on a 
> column p. Column p  has datatype text and values like ‘201601’, ‘201602’…upto 
> ‘201612’.
> And, I am running a query like : Select columns from t where p=’201604’.
>  
> My question is : Can there be a scenario/condition/probability that my query 
> will do a complete table scan on t instead of only reading data for specified 
> partition key. If yes, please put some light on those scenario.
>  
> I’m asking this because someone told me that there is a probability that the 
> query will ignore the partitioning and do a complete table scan to fetch 
> output.
>  
> Thanks & Regards,
> Abhishek Dubey
>  


Re: Some dates add/less a day...

2016-08-01 Thread Julián Arocena
Thank you so much!

I´m testing it.

Best regards,

*Arocena Julian* | Developer


jaroc...@temperies.com


+54 249 4437 972
9 de Julio 509 | Tandil | Buenos Aires | Argentina

+1 (408)524-3071 I (650)704-7915

440 N. Wolfe Road, Sunnyvale CA 94085 | San Francisco | USA

www.temperies.com


2016-07-30 8:47 GMT-03:00 Andrew Sears :

> It is HIVE-13948.
>
>
> https://github.com/apache/hive/commit/da3ed68eda10533f3c50aae19731ac6d059cda87
>
> https://issues.apache.org/jira/browse/HIVE-13948
>
> Regards,
>
> Andrew
>
> On July 29, 2016 at 6:44 PM Julián Arocena  wrote:
>
> Hey, thank you so much! I was going crazy, you can image it :)
>
> Please let me know if you have it.
>
> I will have a nice weekend with this news
>
> Best regards,
> El 29/7/2016 18:44, "Andrew Sears" 
> escribió:
>
> Hi there,
>
> This is a critical bug fixed by a JIRA, will see if I can get the number
> for you. It involves patching lib/hive-* files.
>
> Cheers,
> Andrew
>
> On Fri, Jul 29, 2016 at 4:37 PM, Julián Arocena 
> wrote:
>
> Hi,
>
> I´m having a problem with some dates using external tables to a text file.
> Let me give you an example:
>
> file content:
>
> *1946-10-01*
> 1946-10-02
>
>
> table:
>
> create external table date_issue_test
> (
>
> date_test Date
>
> )
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '\001'
> LINES TERMINATED BY '\n'
> STORED AS TEXTFILE
> LOCATION '/user/hive/test';
>
>
> Select * from date_issue_test;
>
> OK
> *1946-10-02*
> 1946-10-02
>
>
> As you can see in this case it adds a day, there are a few cases like this.
>
> Also I tried with a CAST and a fixed date as bellow :
>
> hive> select CAST('1946-10-01' as date) from date_issue_test limit 1;
> OK
> 1946-10-02
>
>
> Any idea to help me?
>
> Thank you so much!
>
> *Julian*
>
>
>


Re: How to pass the TestHplsqlDb test in hive-hplsql?

2016-08-01 Thread Dmitry Tolpeko
Please try to find them in ./ql/src/test/queries/clientpositive directory
(see topn.q file for example).

Thanks,
Dmitry

On Mon, Aug 1, 2016 at 11:34 AM, Zhenyi Zhao  wrote:

> Hi Dmitry,
>
> Thank you for your answer.  you said “*src* and *sample_07* are
> sample tables supplied with Hive” , where can I find this infomation of
> these tables.
>
> Emerson
>
> 2016-08-01 16:27 GMT+08:00 Dmitry Tolpeko :
>
>> Hi Emerson,
>>
>> I did not commit TestHplsqlDb.java since Apache Pre-commit test starts
>> executing it, and I did not manage how to pass it (there are connection
>> errors). I can commit it as java_ to prevent execution, or someone can help
>> with connection errors.
>>
>> Some table DDLs are here:
>> https://github.com/apache/hive/blob/master/hplsql/src/test/queries/db/schema.sql
>>
>> *src* and *sample_07* are sample tables supplied with Hive. By the way,
>> *src* table is used in many Hive tests.
>>
>> Thanks,
>> Dmitry
>>
>>
>>
>> On Mon, Aug 1, 2016 at 6:48 AM, Zhenyi Zhao  wrote:
>>
>>> Hi all,
>>>
>>> There is a unit test class named "TestHplsqlDb.java"  I  found at
>>> http://www.hplsql.org/downloads/hplsql-0.3.17-src.zip. But I want to
>>> know why this class dose't exsit at
>>> https://github.com/apache/hive/tree/master/hplsql.
>>>
>>> Now I want to test the hplsql‘s compatibility with hive datasource, but
>>> the tests failed.
>>>
>>> And my question is how to pass the "TestHplsqlDb.java"  test , I found
>>> there are some tables required by the tests , like  src , sample_07, src_dt
>>>  and so on. But I could not find them to pass the test. How to init the
>>> test env?
>>>
>>> I' am waiting for your response. Thank you very much~
>>>
>>> Emerson
>>>
>>
>>
>


Re: Doubt on Hive Partitioning.

2016-08-01 Thread Furcy Pin
Hi Abhishek,

Yes, it can happen.

The only such scenarios I can think of are when you use a WHERE clause with
a non-constant clause.
As far as I know, partition only work on constant clauses, because it has
to evaluate them *before* starting the query in order to prune the
partitions.

For instance:

WHERE p = otherColumn
> here the predicate will depend on the row being read, thus all rows must
be read.
> if otherColumn is a partition, I don't think it work either

WHERE p IN (SELECT p FROM t2)
> here we could argue that Hive could optimize this by computing the sub
query first,
> and then do the partition pruning, but sadly I don't think this
optimisation has been implemented yet


WHERE f(p) = 'constant'
or
WHERE p = f('constant')

where f is a non-deterministic or non-stateful UDF.
An example of non-deterministic function are rand() and
unix_timestamp() because
it is evaluated differently at each row

So if you want today's partition, you should use instead current_date(),
which is deterministic,  since it takes the time of compilation of the
query.
It is only available since Hive 1.2.0 though.

You can know if a Hive UDF is deterministic and stateful by looking at the
class annotation UDFType in it's source code.
If you plan on writing your own UDF, don't forget to specifiy this
annotation as well.

hope this helps,

Furcy




On Mon, Aug 1, 2016 at 11:07 AM, Abhishek Dubey 
wrote:

> Hi All,
>
>
>
> I have a very big table *t* with billions of rows and it is partitioned
> on a column *p*. Column *p * has datatype text and values like ‘201601’,
> ‘201602’…upto ‘201612’.
>
> And, I am running a query like : *Select columns from t where p=’201604’.*
>
>
>
> My question is : Can there be a scenario/condition/probability that my
> query will do a complete table scan on *t* instead of only reading data
> for specified partition key. If yes, please put some light on those
> scenario.
>
>
>
> I’m asking this because someone told me that there is a probability that
> the query will ignore the partitioning and do a complete table scan to
> fetch output.
>
>
>
> *Thanks & Regards,*
> *Abhishek Dubey*
>
>
>


Doubt on Hive Partitioning.

2016-08-01 Thread Abhishek Dubey
Hi All,

I have a very big table t with billions of rows and it is partitioned on a 
column p. Column p  has datatype text and values like '201601', '201602'...upto 
'201612'.
And, I am running a query like : Select columns from t where p='201604'.

My question is : Can there be a scenario/condition/probability that my query 
will do a complete table scan on t instead of only reading data for specified 
partition key. If yes, please put some light on those scenario.

I'm asking this because someone told me that there is a probability that the 
query will ignore the partitioning and do a complete table scan to fetch output.

Thanks & Regards,
Abhishek Dubey



Re: How to pass the TestHplsqlDb test in hive-hplsql?

2016-08-01 Thread Zhenyi Zhao
Hi Dmitry,

Thank you for your answer.  you said “*src* and *sample_07* are sample
tables supplied with Hive” , where can I find this infomation of these
tables.

Emerson

2016-08-01 16:27 GMT+08:00 Dmitry Tolpeko :

> Hi Emerson,
>
> I did not commit TestHplsqlDb.java since Apache Pre-commit test starts
> executing it, and I did not manage how to pass it (there are connection
> errors). I can commit it as java_ to prevent execution, or someone can help
> with connection errors.
>
> Some table DDLs are here:
> https://github.com/apache/hive/blob/master/hplsql/src/test/queries/db/schema.sql
>
> *src* and *sample_07* are sample tables supplied with Hive. By the way,
> *src* table is used in many Hive tests.
>
> Thanks,
> Dmitry
>
>
>
> On Mon, Aug 1, 2016 at 6:48 AM, Zhenyi Zhao  wrote:
>
>> Hi all,
>>
>> There is a unit test class named "TestHplsqlDb.java"  I  found at
>> http://www.hplsql.org/downloads/hplsql-0.3.17-src.zip. But I want to
>> know why this class dose't exsit at
>> https://github.com/apache/hive/tree/master/hplsql.
>>
>> Now I want to test the hplsql‘s compatibility with hive datasource, but
>> the tests failed.
>>
>> And my question is how to pass the "TestHplsqlDb.java"  test , I found
>> there are some tables required by the tests , like  src , sample_07, src_dt
>>  and so on. But I could not find them to pass the test. How to init the
>> test env?
>>
>> I' am waiting for your response. Thank you very much~
>>
>> Emerson
>>
>
>


Re: How to pass the TestHplsqlDb test in hive-hplsql?

2016-08-01 Thread Dmitry Tolpeko
Hi Emerson,

I did not commit TestHplsqlDb.java since Apache Pre-commit test starts
executing it, and I did not manage how to pass it (there are connection
errors). I can commit it as java_ to prevent execution, or someone can help
with connection errors.

Some table DDLs are here:
https://github.com/apache/hive/blob/master/hplsql/src/test/queries/db/schema.sql

*src* and *sample_07* are sample tables supplied with Hive. By the way,
*src* table is used in many Hive tests.

Thanks,
Dmitry



On Mon, Aug 1, 2016 at 6:48 AM, Zhenyi Zhao  wrote:

> Hi all,
>
> There is a unit test class named "TestHplsqlDb.java"  I  found at
> http://www.hplsql.org/downloads/hplsql-0.3.17-src.zip. But I want to know
> why this class dose't exsit at
> https://github.com/apache/hive/tree/master/hplsql.
>
> Now I want to test the hplsql‘s compatibility with hive datasource, but
> the tests failed.
>
> And my question is how to pass the "TestHplsqlDb.java"  test , I found
> there are some tables required by the tests , like  src , sample_07, src_dt
>  and so on. But I could not find them to pass the test. How to init the
> test env?
>
> I' am waiting for your response. Thank you very much~
>
> Emerson
>


Re: Hive on spark

2016-08-01 Thread Mich Talebzadeh
Hi,

You can download the pdf from here


HTH

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 1 August 2016 at 03:05, Chandrakanth Akkinepalli <
chandrakanth.akkinepa...@gmail.com> wrote:

> Hi Dr.Mich,
> Can you please share your London meetup presentation. Curious to see the
> comparison according to you of various query engines.
>
> Thanks,
> Chandra
>
> On Jul 28, 2016, at 12:13 AM, Mich Talebzadeh 
> wrote:
>
> Hi,
>
> I made a presentation in London on 20th July on this subject:. In that I
> explained how to make Spark work as an execution engine for Hive.
>
> Query Engines for Hive, MR, Spark, Tez and LLAP – Considerations
> !
>
> See if I can send the presentation
>
> Cheers
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 28 July 2016 at 04:24, Mudit Kumar  wrote:
>
>> Yes Mich,exactly.
>>
>> Thanks,
>> Mudit
>>
>> From: Mich Talebzadeh 
>> Reply-To: 
>> Date: Thursday, July 28, 2016 at 1:08 AM
>> To: user 
>> Subject: Re: Hive on spark
>>
>> You mean you want to run Hive using Spark as the execution engine which
>> uses Yarn by default?
>>
>>
>> Something like below
>>
>> hive> select max(id) from oraclehadoop.dummy_parquet;
>> Starting Spark Job = 8218859d-1d7c-419c-adc7-4de175c3ca6d
>> Query Hive on Spark job[1] stages:
>> 2
>> 3
>> Status: Running (Hive on Spark job[1])
>> Job Progress Format
>> CurrentTime StageId_StageAttemptId:
>> SucceededTasksCount(+RunningTasksCount-FailedTasksCount)/TotalTasksCount
>> [StageCost]
>> 2016-07-27 20:38:17,269 Stage-2_0: 0(+8)/24 Stage-3_0: 0/1
>> 2016-07-27 20:38:20,298 Stage-2_0: 8(+4)/24 Stage-3_0: 0/1
>> 2016-07-27 20:38:22,309 Stage-2_0: 11(+1)/24Stage-3_0: 0/1
>> 2016-07-27 20:38:23,330 Stage-2_0: 12(+8)/24Stage-3_0: 0/1
>> 2016-07-27 20:38:26,360 Stage-2_0: 17(+7)/24Stage-3_0: 0/1
>> 2016-07-27 20:38:27,386 Stage-2_0: 20(+4)/24Stage-3_0: 0/1
>> 2016-07-27 20:38:28,391 Stage-2_0: 21(+3)/24Stage-3_0: 0/1
>> 2016-07-27 20:38:29,395 Stage-2_0: 24/24 Finished   Stage-3_0: 1/1
>> Finished
>> Status: Finished successfully in 13.14 seconds
>> OK
>> 1
>> Time taken: 13.426 seconds, Fetched: 1 row(s)
>>
>>
>> HTH
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>> On 27 July 2016 at 20:31, Mudit Kumar  wrote:
>>
>>> Hi All,
>>>
>>> I need to configure hive cluster based on spark engine (yarn).
>>> I already have a running hadoop cluster.
>>>
>>> Can someone point me to relevant documentation?
>>>
>>> TIA.
>>>
>>> Thanks,
>>> Mudit
>>>
>>
>>
>


Sample programs, for read/write of "timestamp data" to Parquet-files

2016-08-01 Thread Ravi Tatapudi
Hello,

I have a test-application (a stand-alone java-program)  for reading (& 
writing) "Parquet-files".  This program is built using "Parquet-Avro API". 
Using this program, I could read datatypes such as "CHAR, VARCHAR, INT, 
FLOAT, DOUBLE...etc, but failing to read "timestamp" data from 
Parquet-files created by Hive. 

After checking further, I realized that, the latest released version of 
Parquet-Avro-API (ver#: 1.8.1) doesn't support "timestamp", but I see 
that, "Hive" is processing "timestamp" with nanosecond preceision. 

So, I am trying to understand, where I can find "sample java programs", 
built using "Hive-API", that can process "timestamp" data types from the 
"Parquet-files" created by Hive. 

Hence, I request you to point me to the sample programs to develop 
test-applications for reading/writing "timestamp" data to parquet-files ? 

Thanks,
 Ravi