Looking for a PR reviewer for the bug fixed: HIVE-25912

2022-02-08 Thread Fred Bai
Hi everyone:

I have fixed a hive bug that Jira URL is:
https://issues.apache.org/jira/browse/HIVE-25912

My PR is: https://github.com/apache/hive/pull/2987

Where can I get a reviewer?

Thanks.


Re: metastore bug when hive update spark table ?

2022-01-06 Thread Mich Talebzadeh
Well I have seen this type of error before.

I tend to create the table in hive first and alter it in spark if needed.
This is spark 3.1.1 with Hive (version 3.1.1)

0: jdbc:hive2://rhes75:10099/default> create table my_table2 (col1 int,
col2 int)
0: jdbc:hive2://rhes75:10099/default> describe my_table2;
+---++--+
| col_name  | data_type  | comment  |
+---++--+
| col1  | int|  |
| col2  | int|  |
+---++--+
2 rows selected (0.17 seconds)

in Spark

>>> spark.sql("""ALTER TABLE my_table2 ADD column col3 string""")
DataFrame[]
>>> for c in spark.sql("""describe formatted my_table2 """).collect():
...   print(c)
...
*Row(col_name='col1', data_type='int', comment=None)*
*Row(col_name='col2', data_type='int', comment=None)*
*Row(col_name='col3', data_type='string', comment=None)*
Row(col_name='', data_type='', comment='')
Row(col_name='# Detailed Table Information', data_type='', comment='')
Row(col_name='Database', data_type='default', comment='')
Row(col_name='Table', data_type='my_table2', comment='')
Row(col_name='Owner', data_type='hduser', comment='')
Row(col_name='Created Time', data_type='Thu Jan 06 17:16:37 GMT 2022',
comment='')
Row(col_name='Last Access', data_type='UNKNOWN', comment='')
Row(col_name='Created By', data_type='Spark 2.2 or prior', comment='')
Row(col_name='Type', data_type='MANAGED', comment='')
Row(col_name='Provider', data_type='hive', comment='')
Row(col_name='Table Properties', data_type='[bucketing_version=2,
transient_lastDdlTime=1641489641]', comment='')
Row(col_name='Location',
data_type='hdfs://rhes75:9000/user/hive/warehouse/my_table2', comment='')
Row(col_name='Serde Library',
data_type='org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe', comment='')
Row(col_name='InputFormat',
data_type='org.apache.hadoop.mapred.TextInputFormat', comment='')
Row(col_name='OutputFormat',
data_type='org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat',
comment='')
Row(col_name='Storage Properties', data_type='[serialization.format=1]',
comment='')
Row(col_name='Partition Provider', data_type='Catalog', comment='')


This is my work around

HTH

   view my Linkedin profile




*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Thu, 6 Jan 2022 at 16:17, Nicolas Paris  wrote:

> Hi there.
>
> I also posted this problem in the spark list. I am no sure this is a
> spark or a hive metastore problem. Or if there is some metastore tunning
> configuration as workaround.
>
>
> Spark can't see hive schema updates partly because it stores the schema
> in a weird way in hive metastore.
>
>
> 1. FROM SPARK: create a table
> 
> >>> spark.sql("select 1 col1, 2
> col2").write.format("parquet").saveAsTable("my_table")
> >>> spark.table("my_table").printSchema()
> root
> |-- col1: integer (nullable = true)
> |-- col2: integer (nullable = true)
>
>
> 2. FROM HIVE: alter the schema
> ==
> 0: jdbc:hive2://localhost:1> ALTER TABLE my_table REPLACE
> COLUMNS(`col1` int, `col2` int, `col3` string);
> 0: jdbc:hive2://localhost:1> describe my_table;
> +---++--+
> | col_name | data_type | comment |
> +---++--+
> | col1 | int | |
> | col2 | int | |
> | col3 | string | |
> +---++--+
>
>
> 3. FROM SPARK: problem, column does not appear
> ==
> >>> spark.table("my_table").printSchema()
> root
> |-- col1: integer (nullable = true)
> |-- col2: integer (nullable = true)
>
>
> 4. FROM METASTORE DB: two ways of storing the columns
> ==
> metastore=# select * from "COLUMNS_V2";
> CD_ID | COMMENT | COLUMN_NAME | TYPE_NAME | INTEGER_IDX
> ---+-+-+---+-
> 2 | | col1 | int | 0
> 2 | | col2 | int | 1
> 2 | | col3 | string | 2
>
>
> metastore=# select * from "TABLE_PARAMS";
> TBL_ID | PARAM_KEY | PARAM_VALUE
>
>
> +---+-
> ---
> 1 | spark.sql.sources.provider | parquet
> 1 | spark.sql.sources.schema.part.0 |
>
> {"type":"struct","fields":[{"name":"col1","type":"integer","nullable":true,"metadata":{}},{"name":"col2","type":"integer","n
> ullable":true,"metadata":{}}]}
> 1 | spark.sql.create.version | 2.4.8
> 1 | spark.sql.sources.schema.numParts | 1
> 1 | last_modified_time | 1641483180
> 1 | transient_lastDdlTime | 1641483180
> 1 | last_modified_by | anonymous
>
> metastore=# truncate 

metastore bug when hive update spark table ?

2022-01-06 Thread Nicolas Paris
Hi there.

I also posted this problem in the spark list. I am no sure this is a
spark or a hive metastore problem. Or if there is some metastore tunning
configuration as workaround.


Spark can't see hive schema updates partly because it stores the schema
in a weird way in hive metastore.


1. FROM SPARK: create a table

>>> spark.sql("select 1 col1, 2 
>>> col2").write.format("parquet").saveAsTable("my_table")
>>> spark.table("my_table").printSchema()
root
|-- col1: integer (nullable = true)
|-- col2: integer (nullable = true)


2. FROM HIVE: alter the schema
==
0: jdbc:hive2://localhost:1> ALTER TABLE my_table REPLACE
COLUMNS(`col1` int, `col2` int, `col3` string);
0: jdbc:hive2://localhost:1> describe my_table;
+---++--+
| col_name | data_type | comment |
+---++--+
| col1 | int | |
| col2 | int | |
| col3 | string | |
+---++--+


3. FROM SPARK: problem, column does not appear
==
>>> spark.table("my_table").printSchema()
root
|-- col1: integer (nullable = true)
|-- col2: integer (nullable = true)


4. FROM METASTORE DB: two ways of storing the columns
==
metastore=# select * from "COLUMNS_V2";
CD_ID | COMMENT | COLUMN_NAME | TYPE_NAME | INTEGER_IDX
---+-+-+---+-
2 | | col1 | int | 0
2 | | col2 | int | 1
2 | | col3 | string | 2


metastore=# select * from "TABLE_PARAMS";
TBL_ID | PARAM_KEY | PARAM_VALUE

+---+-
---
1 | spark.sql.sources.provider | parquet
1 | spark.sql.sources.schema.part.0 |
{"type":"struct","fields":[{"name":"col1","type":"integer","nullable":true,"metadata":{}},{"name":"col2","type":"integer","n
ullable":true,"metadata":{}}]}
1 | spark.sql.create.version | 2.4.8
1 | spark.sql.sources.schema.numParts | 1
1 | last_modified_time | 1641483180
1 | transient_lastDdlTime | 1641483180
1 | last_modified_by | anonymous

metastore=# truncate "TABLE_PARAMS";
TRUNCATE TABLE


5. FROM SPARK: now the column magically appears
==
>>> spark.table("my_table").printSchema()
root
|-- col1: integer (nullable = true)
|-- col2: integer (nullable = true)
|-- col3: string (nullable = true)


Then is it necessary to store that stuff in the TABLE_PARAMS ?




AW: Count bug in Hive 3.0.0.3.1

2020-04-28 Thread Julien Tane
yes  we used HDP for this. So it might come from there.



Julien Tane
Big Data Engineer

[Tel.]  +49 721 98993-393
[Fax]   +49 721 98993-66
[E-Mail]j...@solute.de<mailto:j...@solute.de>


solute GmbH
Zeppelinstraße 15
76185 Karlsruhe
Germany


[Logo Solute]

Marken der solute GmbH | brands of solute GmbH
[Marken]

Geschäftsführer | Managing Director: Dr. Thilo Gans, Bernd Vermaaten
Webseite | www.solute.de <http://www.solute.de/>
Sitz | Registered Office: Karlsruhe
Registergericht | Register Court: Amtsgericht Mannheim
Registernummer | Register No.: HRB 110579
USt-ID | VAT ID: DE234663798



Informationen zum Datenschutz | Information about privacy policy
https://www.solute.de/ger/datenschutz/grundsaetze-der-datenverarbeitung.php





Von: Sungwoo Park 
Gesendet: Dienstag, 28. April 2020 09:17:38
An: user@hive.apache.org
Betreff: Re: Count bug in Hive 3.0.0.3.1

I have tested the script with Hive 2.3.6, Hive 3.1.2, and Hive 4.0.0-SNAPSHOT 
(all with minor modifications), and have not found any problem. So, I guess all 
the master branches are fine.

If Hive 3.0.0.3.1 is the release included in HDP 3.0.0 or HDP 3.0.1, I remember 
that this Hive-LLAP/Tez release was not stable. So, it could be a problem 
specific to the release in HDP 3.0.0/3.0.1.<http://3.0.1.>

--- Sungwoo

On Tue, Apr 28, 2020 at 4:01 PM Peter Vary 
mailto:pv...@cloudera.com>> wrote:
Hi Deepak,

If I were you, I would test your repro case on the master branch.

  *   If it is fixed, I think you should try to find the fix which solves the 
problem and cherry-pick the fix to branch-3 and branch-3.1 so the fix is there 
in the next release.
  *   If the problem is still present on the master branch, then take a look at 
this page https://cwiki.apache.org/confluence/display/Hive/HowToContribute This 
describes the development method for Hive.

Thanks,
Peter

On Apr 28, 2020, at 07:22, Deepak Krishna 
mailto:hs-d...@solute.de>> wrote:


Hi team,

We came across a bug related to count function. We are using hive 3.0.0.3.1 
with Tez 0.9.0.3.1. PFA the queries to replicate the issue.

Please register this as a bug and let us know if we can support in anyway to 
fix the issue. It would also be helpful to know if there are any other 
workarounds for this issue.

Thanks and Regards,
Deepak Krishna





Deepak Krishna
Big Data Engineer

[Tel.]
[Fax]   +49 721 98993-
[E-Mail]hs-d...@solute.de<mailto:hs-d...@solute.de>



solute GmbH
Zeppelinstraße 15
76185 Karlsruhe
Germany

[Logo Solute]

Marken der solute GmbH | brands of solute GmbH
[Marken]

Geschäftsführer | Managing Director: Dr. Thilo Gans, Bernd Vermaaten
Webseite | www.solute.de <http://www.solute.de/>
Sitz | Registered Office: Karlsruhe
Registergericht | Register Court: Amtsgericht Mannheim
Registernummer | Register No.: HRB 110579
USt-ID | VAT ID: DE234663798



Informationen zum Datenschutz | Information about privacy policy
https://www.solute.de/ger/datenschutz/grundsaetze-der-datenverarbeitung.php







Re: Count bug in Hive 3.0.0.3.1

2020-04-28 Thread Tim Havens
Unsubscribe

On Tue, Apr 28, 2020, 1:23 AM Deepak Krishna  wrote:

> Hi team,
>
> We came across a bug related to count function. We are using hive
> 3.0.0.3.1 with Tez 0.9.0.3.1. PFA the queries to replicate the issue.
>
> Please register this as a bug and let us know if we can support in anyway
> to fix the issue. It would also be helpful to know if there are any other
> workarounds for this issue.
>
> Thanks and Regards,
> Deepak Krishna
>
>
>
>
> Deepak Krishna
> Big Data Engineer
>
> [image: Tel.]
> [image: Fax] +49 721 98993-
> [image: E-Mail] hs-d...@solute.de
>
> solute GmbH
> Zeppelinstraße 15
> 76185 Karlsruhe
> Germany
>
>
> [image: Logo Solute]
>
> Marken der solute GmbH | brands of solute GmbH
> [image: Marken]
> Geschäftsführer | Managing Director: Dr. Thilo Gans, Bernd Vermaaten
> Webseite | www.solute.de
> Sitz | Registered Office: Karlsruhe
> Registergericht | Register Court: Amtsgericht Mannheim
> Registernummer | Register No.: HRB 110579
> USt-ID | VAT ID: DE234663798
>
> *Informationen zum Datenschutz | Information about privacy policy*
> https://www.solute.de/ger/datenschutz/grundsaetze-der-datenverarbeitung.php
>
>
>
>


Re: Count bug in Hive 3.0.0.3.1

2020-04-28 Thread Sungwoo Park
I have tested the script with Hive 2.3.6, Hive 3.1.2, and Hive
4.0.0-SNAPSHOT (all with minor modifications), and have not found any
problem. So, I guess all the master branches are fine.

If Hive 3.0.0.3.1 is the release included in HDP 3.0.0 or HDP 3.0.1, I
remember that this Hive-LLAP/Tez release was not stable. So, it could be a
problem specific to the release in HDP 3.0.0/3.0.1.

--- Sungwoo

On Tue, Apr 28, 2020 at 4:01 PM Peter Vary  wrote:

> Hi Deepak,
>
> If I were you, I would test your repro case on the master branch.
>
>- If it is fixed, I think you should try to find the fix which solves
>the problem and cherry-pick the fix to branch-3 and branch-3.1 so the fix
>is there in the next release.
>- If the problem is still present on the master branch, then take a
>look at this page
>https://cwiki.apache.org/confluence/display/Hive/HowToContribute This
>describes the development method for Hive.
>
>
> Thanks,
> Peter
>
> On Apr 28, 2020, at 07:22, Deepak Krishna  wrote:
>
> Hi team,
>
> We came across a bug related to count function. We are using hive
> 3.0.0.3.1 with Tez 0.9.0.3.1. PFA the queries to replicate the issue.
>
> Please register this as a bug and let us know if we can support in anyway
> to fix the issue. It would also be helpful to know if there are any other
> workarounds for this issue.
>
> Thanks and Regards,
> Deepak Krishna
>
>
>
>
> Deepak Krishna
> Big Data Engineer
>
> [image: Tel.]
> [image: Fax] +49 721 98993-
> [image: E-Mail] hs-d...@solute.de
>
>
> solute GmbH
> Zeppelinstraße 15
> 76185 Karlsruhe
> Germany
>
> [image: Logo Solute]
>
> Marken der solute GmbH | brands of solute GmbH
> [image: Marken]
> Geschäftsführer | Managing Director: Dr. Thilo Gans, Bernd Vermaaten
> Webseite | www.solute.de  
> Sitz | Registered Office: Karlsruhe
> Registergericht | Register Court: Amtsgericht Mannheim
> Registernummer | Register No.: HRB 110579
> USt-ID | VAT ID: DE234663798
>
> *Informationen zum Datenschutz | Information about privacy policy*
> https://www.solute.de/ger/datenschutz/grundsaetze-der-datenverarbeitung.php
>
>
> 
>
>
>


Re: Count bug in Hive 3.0.0.3.1

2020-04-28 Thread Peter Vary
Hi Deepak,

If I were you, I would test your repro case on the master branch.
If it is fixed, I think you should try to find the fix which solves the problem 
and cherry-pick the fix to branch-3 and branch-3.1 so the fix is there in the 
next release. 
If the problem is still present on the master branch, then take a look at this 
page https://cwiki.apache.org/confluence/display/Hive/HowToContribute 
 This 
describes the development method for Hive.

Thanks,
Peter

> On Apr 28, 2020, at 07:22, Deepak Krishna  wrote:
> 
> Hi team,
> 
> We came across a bug related to count function. We are using hive 3.0.0.3.1 
> with Tez 0.9.0.3.1. PFA the queries to replicate the issue.
> 
> Please register this as a bug and let us know if we can support in anyway to 
> fix the issue. It would also be helpful to know if there are any other 
> workarounds for this issue.
> 
> Thanks and Regards,
> Deepak Krishna
> 
> 
>  
>  
> Deepak Krishna
> Big Data Engineer
>  
>   
>   +49 721 98993-
>   hs-d...@solute.de 
>  
> solute GmbH
> Zeppelinstraße 15
> 76185 Karlsruhe
> Germany
>  
> 
>  
> Marken der solute GmbH | brands of solute GmbH
> 
> Geschäftsführer | Managing Director: Dr. Thilo Gans, Bernd Vermaaten
> Webseite | www.solute.de  
> Sitz | Registered Office: Karlsruhe 
> Registergericht | Register Court: Amtsgericht Mannheim 
> Registernummer | Register No.: HRB 110579 
> USt-ID | VAT ID: DE234663798
>  
> Informationen zum Datenschutz | Information about privacy policy
> https://www.solute.de/ger/datenschutz/grundsaetze-der-datenverarbeitung.php 
> 
>  
> 



Count bug in Hive 3.0.0.3.1

2020-04-27 Thread Deepak Krishna
Hi team,

We came across a bug related to count function. We are using hive 3.0.0.3.1 
with Tez 0.9.0.3.1. PFA the queries to replicate the issue.

Please register this as a bug and let us know if we can support in anyway to 
fix the issue. It would also be helpful to know if there are any other 
workarounds for this issue.

Thanks and Regards,
Deepak Krishna





Deepak Krishna
Big Data Engineer

[Tel.]
[Fax]   +49 721 98993-
[E-Mail]hs-d...@solute.de


solute GmbH
Zeppelinstraße 15
76185 Karlsruhe
Germany


[Logo Solute]

Marken der solute GmbH | brands of solute GmbH
[Marken]

Geschäftsführer | Managing Director: Dr. Thilo Gans, Bernd Vermaaten
Webseite | www.solute.de 
Sitz | Registered Office: Karlsruhe
Registergericht | Register Court: Amtsgericht Mannheim
Registernummer | Register No.: HRB 110579
USt-ID | VAT ID: DE234663798



Informationen zum Datenschutz | Information about privacy policy
https://www.solute.de/ger/datenschutz/grundsaetze-der-datenverarbeitung.php





count_bug.sql
Description: count_bug.sql


Re: bug in hive

2014-09-23 Thread Alan Gates

Shushant,

Creating a patched jar that would include the lock functionality you 
want is unlikely to work.  Wouldn't the following workflow work for you:


1. Writer locks the table explicitly via LOCK TABLE
2. Writer inserts
3. Writer unlocks the table explicitly via UNLOCK TABLE

If you're using ZK for your locking I think the client dying (as opposed 
to ending the session) should cause the lock to expire.  If not, you may 
have to assure the unlock happens in your application.  Hope that helps.


Alan.


Shushant Arora mailto:shushantaror...@gmail.com
September 20, 2014 at 8:00
Hi Alan

I have 0.10 version of hive deployed in my org's cluster, I cannot 
update that because of org's policy.
How can I achieve exclusive lock functionality while inserting in 
dynamic partition on hive 0.10 ?
Does calling hive scripts via some sort of java api with patched jar 
included will help ?
Moreover hive does not release locks in 0.10 when hive session is 
killed . User has to explicitly unlock a table.

Can i specify any sort of max expiry time while taking a lock.

Thanks
Shushant


Alan Gates mailto:ga...@hortonworks.com
September 20, 2014 at 7:41
Up until Hive 0.13 locks in Hive were really advisory only, since as 
you note any user can remove any other user's lock.  In Hive 0.13 a 
new type of locking was introduced, see 
https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-LockManager  
This new locking is automatic and ignores both LOCK and UNLOCK 
commands.  Note that it is off by default, you have to configure Hive 
to use the new DbTxnManager to get turn on this locking.  In 0.13 it 
still has the bug you describe as far as acquiring the wrong lock for 
dynamic partitioning, but I believe I've fixed that in 0.14.


Alan.


Shushant Arora mailto:shushantaror...@gmail.com
September 20, 2014 at 5:39

Hive version 0.9 and later has a bug

While inserting in a hive table Hive takes an exclusive lock. But if 
table is partitioned , and insert is in dynamic partition , it will 
take shared lock on table but if all partitions are static then hive 
takes exclusive lock on partitions in which data is being inserted


and shared lock on table.

https://issues.apache.org/jira/browse/HIVE-3509


1.What if I want to take exclusive lock on table while inserting in 
dynamic partition ?



I tried to take explicit lock using :

LOCK TABLE tablename EXCLUSIVE;


But it made table to be disabled.

I cannot even read from table anymore even is same session until I do

unlock table tablename in another session;


2. moreover whats lock level in hive , I mean any user can remove any 
other users lock. that too seems buggy.



Thanks

Shushant





--
Sent with Postbox http://www.getpostbox.com

--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: bug in hive

2014-09-23 Thread Shushant Arora
Hi Alan
1.When writer takes exclusive lock , hive won't allow to write anyone(even
the session which holds lock) to write in table.
Do I  need to pass lock handle to read query or I am missing here something.
2.Or you mean to insert using hadoop filesystem not using hive ?

On Tue, Sep 23, 2014 at 8:13 PM, Alan Gates ga...@hortonworks.com wrote:

 Shushant,

 Creating a patched jar that would include the lock functionality you want
 is unlikely to work.  Wouldn't the following workflow work for you:

 1. Writer locks the table explicitly via LOCK TABLE
 2. Writer inserts
 3. Writer unlocks the table explicitly via UNLOCK TABLE

 If you're using ZK for your locking I think the client dying (as opposed
 to ending the session) should cause the lock to expire.  If not, you may
 have to assure the unlock happens in your application.  Hope that helps.

 Alan.

   Shushant Arora shushantaror...@gmail.com
  September 20, 2014 at 8:00
 Hi Alan

 I have 0.10 version of hive deployed in my org's cluster, I cannot update
 that because of org's policy.
 How can I achieve exclusive lock functionality while inserting in dynamic
 partition on hive 0.10 ?
 Does calling hive scripts via some sort of java api with patched jar
 included will help ?
 Moreover hive does not release locks in 0.10 when hive session is killed .
 User has to explicitly unlock a table.
 Can i specify any sort of max expiry time while taking a lock.

 Thanks
 Shushant


   Alan Gates ga...@hortonworks.com
  September 20, 2014 at 7:41
  Up until Hive 0.13 locks in Hive were really advisory only, since as you
 note any user can remove any other user's lock.  In Hive 0.13 a new type of
 locking was introduced, see
 https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-LockManager
 This new locking is automatic and ignores both LOCK and UNLOCK commands.
 Note that it is off by default, you have to configure Hive to use the new
 DbTxnManager to get turn on this locking.  In 0.13 it still has the bug you
 describe as far as acquiring the wrong lock for dynamic partitioning, but I
 believe I've fixed that in 0.14.

 Alan.


   Shushant Arora shushantaror...@gmail.com
  September 20, 2014 at 5:39

 Hive version 0.9 and later has a bug



 While inserting in a hive table Hive takes an exclusive lock. But if table
 is partitioned , and insert is in dynamic partition , it will take shared
 lock on table but if all partitions are static then hive takes exclusive
 lock on partitions in which data is being inserted

 and shared lock on table.

 https://issues.apache.org/jira/browse/HIVE-3509


 1.What if I want to take exclusive lock on table while inserting in
 dynamic partition ?


 I tried to take explicit lock using :

 LOCK TABLE tablename EXCLUSIVE;


 But it made table to be disabled.

 I cannot even read from table anymore even is same session until I do

 unlock table tablename in another session;


 2. moreover whats lock level in hive , I mean any user can remove any
 other users lock. that too seems buggy.


 Thanks

 Shushant




 --
 Sent with Postbox http://www.getpostbox.com

 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.



Re: bug in hive

2014-09-22 Thread John Omernik
Shushant -

What I believe what Stephen is sarcastically trying to say is that some
organizational education may be in order here. Hive itself is not even at
version 1.0, those of us who use Hive in production know this, and have to
accept that there will be bugs like the one you are trying to address.
There MAY be a workaround, that takes more hours and introduces other bugs
into your environment, alternatively, taking the time to explain why moving
forward form Hive 0.10 to Hive 0.14 really is in the best interest of your
organization.  Perhaps there can be a way where you can do a proof of
concept using Hive 0.14, i.e. copy the metastore to another SQL server, and
try moving the data the table to another location so you can prove out the
fix of your issue. Also, perhaps there can be a way to test the current
workflows that work on 0.10 in 0.14 so you can show that this change really
is right way to move.

Being at this level in an open source project has  huge benefits, but
challenges as well. On one hand you can be much more nimble in your
environment because open source is fluid, but if you are trying to do this
within an environment that doesn't allow you to move like you need, it may
be losing a long term war while winning short term battles.  I guess, what
I am saying is the similar to Stephen, but I highly recommend you work with
team that sets the policy and develop a new way to address how Hive and
other similar projects live within your change management policies.  You
will benefit greatly in the long run.

John



On Sun, Sep 21, 2014 at 1:26 AM, Shushant Arora shushantaror...@gmail.com
wrote:

 Hi Stephen

 We have cloudera setup deployed in our cluster, which we cannot update due
 to orgs policy.
 Till the time its not updated to version 0.14, How can I achieve the
 locking feature please suggest.


 On Sun, Sep 21, 2014 at 10:40 AM, Stephen Sprague sprag...@gmail.com
 wrote:

 great policy. install open source software that's not even version 1.0
 into production and then not allow the ability to improve it (but of course
 reap all the rewards of its benefits.)  so instead of actually fixing the
 problem the right way introduce a super-hack work-around cuz, you know,
 that's much more stable.

 Gotta luv it.   Good luck.

 On Sat, Sep 20, 2014 at 8:00 AM, Shushant Arora 
 shushantaror...@gmail.com wrote:

 Hi Alan

 I have 0.10 version of hive deployed in my org's cluster, I cannot
 update that because of org's policy.
 How can I achieve exclusive lock functionality while inserting in
 dynamic partition on hive 0.10 ?
 Does calling hive scripts via some sort of java api with patched jar
 included will help ?
 Moreover hive does not release locks in 0.10 when hive session is killed
 . User has to explicitly unlock a table.
 Can i specify any sort of max expiry time while taking a lock.

 Thanks
 Shushant

 On Sat, Sep 20, 2014 at 8:11 PM, Alan Gates ga...@hortonworks.com
 wrote:

 Up until Hive 0.13 locks in Hive were really advisory only, since as
 you note any user can remove any other user's lock.  In Hive 0.13 a new
 type of locking was introduced, see
 https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-LockManager
 This new locking is automatic and ignores both LOCK and UNLOCK commands.
 Note that it is off by default, you have to configure Hive to use the new
 DbTxnManager to get turn on this locking.  In 0.13 it still has the bug you
 describe as far as acquiring the wrong lock for dynamic partitioning, but I
 believe I've fixed that in 0.14.

 Alan.

   Shushant Arora shushantaror...@gmail.com
  September 20, 2014 at 5:39

 Hive version 0.9 and later has a bug



 While inserting in a hive table Hive takes an exclusive lock. But if
 table is partitioned , and insert is in dynamic partition , it will take
 shared lock on table but if all partitions are static then hive takes
 exclusive lock on partitions in which data is being inserted

 and shared lock on table.

 https://issues.apache.org/jira/browse/HIVE-3509


 1.What if I want to take exclusive lock on table while inserting in
 dynamic partition ?


 I tried to take explicit lock using :

 LOCK TABLE tablename EXCLUSIVE;


 But it made table to be disabled.

 I cannot even read from table anymore even is same session until I do

 unlock table tablename in another session;


 2. moreover whats lock level in hive , I mean any user can remove any
 other users lock. that too seems buggy.


 Thanks

 Shushant




 --
 Sent with Postbox http://www.getpostbox.com

 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or
 entity to which it is addressed and may contain information that is
 confidential, privileged and exempt from disclosure under applicable law.
 If the reader of this message is not the intended recipient, you are hereby
 notified that any printing, copying, dissemination, distribution,
 disclosure or forwarding of this communication is strictly 

Re: bug in hive

2014-09-21 Thread Shushant Arora
Hi Stephen

We have cloudera setup deployed in our cluster, which we cannot update due
to orgs policy.
Till the time its not updated to version 0.14, How can I achieve the
locking feature please suggest.


On Sun, Sep 21, 2014 at 10:40 AM, Stephen Sprague sprag...@gmail.com
wrote:

 great policy. install open source software that's not even version 1.0
 into production and then not allow the ability to improve it (but of course
 reap all the rewards of its benefits.)  so instead of actually fixing the
 problem the right way introduce a super-hack work-around cuz, you know,
 that's much more stable.

 Gotta luv it.   Good luck.

 On Sat, Sep 20, 2014 at 8:00 AM, Shushant Arora shushantaror...@gmail.com
  wrote:

 Hi Alan

 I have 0.10 version of hive deployed in my org's cluster, I cannot update
 that because of org's policy.
 How can I achieve exclusive lock functionality while inserting in dynamic
 partition on hive 0.10 ?
 Does calling hive scripts via some sort of java api with patched jar
 included will help ?
 Moreover hive does not release locks in 0.10 when hive session is killed
 . User has to explicitly unlock a table.
 Can i specify any sort of max expiry time while taking a lock.

 Thanks
 Shushant

 On Sat, Sep 20, 2014 at 8:11 PM, Alan Gates ga...@hortonworks.com
 wrote:

 Up until Hive 0.13 locks in Hive were really advisory only, since as you
 note any user can remove any other user's lock.  In Hive 0.13 a new type of
 locking was introduced, see
 https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-LockManager
 This new locking is automatic and ignores both LOCK and UNLOCK commands.
 Note that it is off by default, you have to configure Hive to use the new
 DbTxnManager to get turn on this locking.  In 0.13 it still has the bug you
 describe as far as acquiring the wrong lock for dynamic partitioning, but I
 believe I've fixed that in 0.14.

 Alan.

   Shushant Arora shushantaror...@gmail.com
  September 20, 2014 at 5:39

 Hive version 0.9 and later has a bug



 While inserting in a hive table Hive takes an exclusive lock. But if
 table is partitioned , and insert is in dynamic partition , it will take
 shared lock on table but if all partitions are static then hive takes
 exclusive lock on partitions in which data is being inserted

 and shared lock on table.

 https://issues.apache.org/jira/browse/HIVE-3509


 1.What if I want to take exclusive lock on table while inserting in
 dynamic partition ?


 I tried to take explicit lock using :

 LOCK TABLE tablename EXCLUSIVE;


 But it made table to be disabled.

 I cannot even read from table anymore even is same session until I do

 unlock table tablename in another session;


 2. moreover whats lock level in hive , I mean any user can remove any
 other users lock. that too seems buggy.


 Thanks

 Shushant




 --
 Sent with Postbox http://www.getpostbox.com

 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.






bug in hive

2014-09-20 Thread Shushant Arora
Hive version 0.9 and later has a bug



While inserting in a hive table Hive takes an exclusive lock. But if table
is partitioned , and insert is in dynamic partition , it will take shared
lock on table but if all partitions are static then hive takes exclusive
lock on partitions in which data is being inserted

and shared lock on table.

https://issues.apache.org/jira/browse/HIVE-3509


1.What if I want to take exclusive lock on table while inserting in dynamic
partition ?


I tried to take explicit lock using :

LOCK TABLE tablename EXCLUSIVE;


But it made table to be disabled.

I cannot even read from table anymore even is same session until I do

unlock table tablename in another session;


2. moreover whats lock level in hive , I mean any user can remove any other
users lock. that too seems buggy.


Thanks

Shushant


Re: bug in hive

2014-09-20 Thread Alan Gates
Up until Hive 0.13 locks in Hive were really advisory only, since as you 
note any user can remove any other user's lock.  In Hive 0.13 a new type 
of locking was introduced, see 
https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-LockManager  
This new locking is automatic and ignores both LOCK and UNLOCK 
commands.  Note that it is off by default, you have to configure Hive to 
use the new DbTxnManager to get turn on this locking.  In 0.13 it still 
has the bug you describe as far as acquiring the wrong lock for dynamic 
partitioning, but I believe I've fixed that in 0.14.


Alan.


Shushant Arora mailto:shushantaror...@gmail.com
September 20, 2014 at 5:39

Hive version 0.9 and later has a bug

While inserting in a hive table Hive takes an exclusive lock. But if 
table is partitioned , and insert is in dynamic partition , it will 
take shared lock on table but if all partitions are static then hive 
takes exclusive lock on partitions in which data is being inserted


and shared lock on table.

https://issues.apache.org/jira/browse/HIVE-3509


1.What if I want to take exclusive lock on table while inserting in 
dynamic partition ?



I tried to take explicit lock using :

LOCK TABLE tablename EXCLUSIVE;


But it made table to be disabled.

I cannot even read from table anymore even is same session until I do

unlock table tablename in another session;


2. moreover whats lock level in hive , I mean any user can remove any 
other users lock. that too seems buggy.



Thanks

Shushant





--
Sent with Postbox http://www.getpostbox.com

--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: bug in hive

2014-09-20 Thread Shushant Arora
Hi Alan

I have 0.10 version of hive deployed in my org's cluster, I cannot update
that because of org's policy.
How can I achieve exclusive lock functionality while inserting in dynamic
partition on hive 0.10 ?
Does calling hive scripts via some sort of java api with patched jar
included will help ?
Moreover hive does not release locks in 0.10 when hive session is killed .
User has to explicitly unlock a table.
Can i specify any sort of max expiry time while taking a lock.

Thanks
Shushant

On Sat, Sep 20, 2014 at 8:11 PM, Alan Gates ga...@hortonworks.com wrote:

 Up until Hive 0.13 locks in Hive were really advisory only, since as you
 note any user can remove any other user's lock.  In Hive 0.13 a new type of
 locking was introduced, see
 https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-LockManager
 This new locking is automatic and ignores both LOCK and UNLOCK commands.
 Note that it is off by default, you have to configure Hive to use the new
 DbTxnManager to get turn on this locking.  In 0.13 it still has the bug you
 describe as far as acquiring the wrong lock for dynamic partitioning, but I
 believe I've fixed that in 0.14.

 Alan.

   Shushant Arora shushantaror...@gmail.com
  September 20, 2014 at 5:39

 Hive version 0.9 and later has a bug



 While inserting in a hive table Hive takes an exclusive lock. But if table
 is partitioned , and insert is in dynamic partition , it will take shared
 lock on table but if all partitions are static then hive takes exclusive
 lock on partitions in which data is being inserted

 and shared lock on table.

 https://issues.apache.org/jira/browse/HIVE-3509


 1.What if I want to take exclusive lock on table while inserting in
 dynamic partition ?


 I tried to take explicit lock using :

 LOCK TABLE tablename EXCLUSIVE;


 But it made table to be disabled.

 I cannot even read from table anymore even is same session until I do

 unlock table tablename in another session;


 2. moreover whats lock level in hive , I mean any user can remove any
 other users lock. that too seems buggy.


 Thanks

 Shushant




 --
 Sent with Postbox http://www.getpostbox.com

 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.


Re: bug in hive

2014-09-20 Thread Stephen Sprague
great policy. install open source software that's not even version 1.0 into
production and then not allow the ability to improve it (but of course reap
all the rewards of its benefits.)  so instead of actually fixing the
problem the right way introduce a super-hack work-around cuz, you know,
that's much more stable.

Gotta luv it.   Good luck.

On Sat, Sep 20, 2014 at 8:00 AM, Shushant Arora shushantaror...@gmail.com
wrote:

 Hi Alan

 I have 0.10 version of hive deployed in my org's cluster, I cannot update
 that because of org's policy.
 How can I achieve exclusive lock functionality while inserting in dynamic
 partition on hive 0.10 ?
 Does calling hive scripts via some sort of java api with patched jar
 included will help ?
 Moreover hive does not release locks in 0.10 when hive session is killed .
 User has to explicitly unlock a table.
 Can i specify any sort of max expiry time while taking a lock.

 Thanks
 Shushant

 On Sat, Sep 20, 2014 at 8:11 PM, Alan Gates ga...@hortonworks.com wrote:

 Up until Hive 0.13 locks in Hive were really advisory only, since as you
 note any user can remove any other user's lock.  In Hive 0.13 a new type of
 locking was introduced, see
 https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-LockManager
 This new locking is automatic and ignores both LOCK and UNLOCK commands.
 Note that it is off by default, you have to configure Hive to use the new
 DbTxnManager to get turn on this locking.  In 0.13 it still has the bug you
 describe as far as acquiring the wrong lock for dynamic partitioning, but I
 believe I've fixed that in 0.14.

 Alan.

   Shushant Arora shushantaror...@gmail.com
  September 20, 2014 at 5:39

 Hive version 0.9 and later has a bug



 While inserting in a hive table Hive takes an exclusive lock. But if
 table is partitioned , and insert is in dynamic partition , it will take
 shared lock on table but if all partitions are static then hive takes
 exclusive lock on partitions in which data is being inserted

 and shared lock on table.

 https://issues.apache.org/jira/browse/HIVE-3509


 1.What if I want to take exclusive lock on table while inserting in
 dynamic partition ?


 I tried to take explicit lock using :

 LOCK TABLE tablename EXCLUSIVE;


 But it made table to be disabled.

 I cannot even read from table anymore even is same session until I do

 unlock table tablename in another session;


 2. moreover whats lock level in hive , I mean any user can remove any
 other users lock. that too seems buggy.


 Thanks

 Shushant




 --
 Sent with Postbox http://www.getpostbox.com

 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.





Bug in Hive Partition windowing functions?

2014-04-29 Thread Keith
Hi, 

we have an issue with windowing function query never completed when
running against the large dataset  25,000 rows. That is the reducer
(only one) never exit and it appears stuck in an infinite loop. 
I looked at the Reducer counter and it never changes over the 6 hours when it 
gets stuck in a loop.

When the data set is small  25K rows, it runs fine.

Is there any work around this issue? We tested
against Hive 0.11/0.12/0.13 and the same result is the same.

create table window_function_fail
as
select a.*,
sum(case when bprice is not null then 1 else 0 end) over (partition by
date,name order by otime,bprice,aprice desc ROWS BETWEEN UNBOUNDED
PRECEDING AND CURRENT ROW) bidpid
from
large_table a;

create table large_table(
date   string, 
name string   ,
stime  string  ,  
bpricedecimal  ,  
apricedecimal   , 
otime double 
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' stored as textfile;

Thanks in advance.





Bug in Hive Split function (Tested on Hive 0.9 and 0.11)

2013-10-09 Thread John Omernik
Hello all, I think I have outlined a bug in the hive split function:

Summary: When calling split on a string of data, it will only return all
array items if the the last array item has a value. For example, if I have
a string of text delimited by tab with 7 columns, and the first four are
filled, but the last three are blank, split will only return a 4 position
array. If  any number of middle columns are empty, but the last item
still has a value, then it will return the proper number of columns.  This
was tested in Hive 0.9 and hive 0.11.

Data:
(Note \t represents a tab char, \x09 the line endings should be \n (UNIX
style) not sure what email will do to them).  Basically my data is 7 lines
of data with the first 7 letters separated by tab.  On some lines I've left
out certain letters, but kept the number of tabs exactly the same.

input.txt
a\tb\tc\td\te\tf\tg
a\tb\tc\td\te\t\tg
a\tb\t\td\t\tf\tg
\t\t\td\te\tf\tg
a\tb\tc\td\t\t\t
a\t\t\t\te\tf\tg
a\t\t\td\t\t\tg

I then created a table with one column from that data:


DROP TABLE tmp_jo_tab_test;

CREATE table tmp_jo_tab_test (message_line STRING)

STORED AS TEXTFILE;

** **

LOAD DATA LOCAL INPATH '/tmp/input.txt'

OVERWRITE INTO TABLE tmp_jo_tab_test;


Ok just to validate I created a python counting script:


#!/usr/bin/python

** **

import sys

** **

** **

for line in sys.stdin:

line = line[0:-1]

out = line.split(\t)

print len(out)


The output there is :

$ cat input.txt |./cnt_tabs.py

7

7

7

7

7

7

7


Based on that information, split on tab should return me 7 for each line as
well:


hive -e select size(split(message_line, '\\t')) from tmp_jo_tab_test;

** **

7

7

7

7

4

7

7


However it does not.  It would appear that the line where only the first
four letters are filled in(and blank is passed in on the last three) only
returns 4 splits, where there should technically be 7, 4 for letters
included, and three blanks.


a\tb\tc\td\t\t\t


Re: Bug in Hive Split function (Tested on Hive 0.9 and 0.11)

2013-10-09 Thread John Omernik
I opened a JIRA on this: https://issues.apache.org/jira/browse/HIVE-5506




On Wed, Oct 9, 2013 at 9:44 AM, John Omernik j...@omernik.com wrote:

 Hello all, I think I have outlined a bug in the hive split function:

 Summary: When calling split on a string of data, it will only return all
 array items if the the last array item has a value. For example, if I have
 a string of text delimited by tab with 7 columns, and the first four are
 filled, but the last three are blank, split will only return a 4 position
 array. If  any number of middle columns are empty, but the last item
 still has a value, then it will return the proper number of columns.  This
 was tested in Hive 0.9 and hive 0.11.

 Data:
 (Note \t represents a tab char, \x09 the line endings should be \n (UNIX
 style) not sure what email will do to them).  Basically my data is 7 lines
 of data with the first 7 letters separated by tab.  On some lines I've left
 out certain letters, but kept the number of tabs exactly the same.

 input.txt
 a\tb\tc\td\te\tf\tg
 a\tb\tc\td\te\t\tg
 a\tb\t\td\t\tf\tg
 \t\t\td\te\tf\tg
 a\tb\tc\td\t\t\t
 a\t\t\t\te\tf\tg
 a\t\t\td\t\t\tg

 I then created a table with one column from that data:


 DROP TABLE tmp_jo_tab_test;

 CREATE table tmp_jo_tab_test (message_line STRING)

 STORED AS TEXTFILE;

 ** **

 LOAD DATA LOCAL INPATH '/tmp/input.txt'

 OVERWRITE INTO TABLE tmp_jo_tab_test;


 Ok just to validate I created a python counting script:


 #!/usr/bin/python

 ** **

 import sys

 ** **

 ** **

 for line in sys.stdin:

 line = line[0:-1]

 out = line.split(\t)

 print len(out)


 The output there is :

 $ cat input.txt |./cnt_tabs.py

 7

 7

 7

 7

 7

 7

 7


 Based on that information, split on tab should return me 7 for each line
 as well:


 hive -e select size(split(message_line, '\\t')) from tmp_jo_tab_test;***
 *

 ** **

 7

 7

 7

 7

 4

 7

 7


 However it does not.  It would appear that the line where only the first
 four letters are filled in(and blank is passed in on the last three) only
 returns 4 splits, where there should technically be 7, 4 for letters
 included, and three blanks.


 a\tb\tc\td\t\t\t









回复: BUG IN HIVE-4650 seems not fixed

2013-08-01 Thread wzc1989
Hi Yin:  
Thanks for the patch, I patch it and pass this testcase, I will use it with our 
hive11 production test.



在 2013年8月1日星期四,上午5:09,Yin Huai 写道:  
 I just uploaded a patch to https://issues.apache.org/jira/browse/HIVE-4968. 
 You can try it and see if the problem has been resolved for your query.
  
  
 On Wed, Jul 31, 2013 at 11:21 AM, Yin Huai huaiyin@gmail.com 
 (mailto:huaiyin@gmail.com) wrote:
  Seems it is another problem.  
  Can you try
   
   
  SELECT *
  FROM (SELECT VAL001 x1,
   VAL002 x2,
   VAL003 x3,
   VAL004 x4,
   VAL005 y
FROM (SELECT /*+ mapjoin(v2) */ (VAL001- mu1) * 1/(sd1) VAL001,
 (VAL002- mu2) * 1/(sd2) VAL002,
 (VAL003- mu3) * 1/(sd3) VAL003,
 (VAL004- mu4) * 1/(sd4) VAL004,
 (VAL005- mu5) * 1/(sd5) VAL005
  FROM (SELECT x1 VAL001,
   
   x2 VAL002,
   x3 VAL003,
   x4 VAL004,
   y VAL005
FROM cmnt) v3
   
  JOIN (SELECT count(*) c,
   avg(VAL001) mu1,
   avg(VAL002) mu2,
   avg(VAL003) mu3,
   avg(VAL004) mu4,
   avg(VAL005) mu5,
   stddev_pop(VAL001) sd1,
   stddev_pop(VAL002) sd2,
   stddev_pop(VAL003) sd3,
   stddev_pop(VAL004) sd4,
   stddev_pop(VAL005) sd5
FROM (SELECT *
  FROM (SELECT x1 VAL001,
   x2 VAL002,
   x3 VAL003,
   x4 VAL004,
   y VAL005
FROM cmnt) obj1_3) v1) v2) obj1_7) obj1_6;
   
  Also, cmnt in v3 will be used to create the hash table. Seems the part of 
  code in converting Join to MapJoin does not play well with this part of 
  your original query
   
   
  SELECT *
   FROM
 (SELECT x1 VAL001,
 x2 VAL002,
 x3 VAL003,
 x4 VAL004,
 y VAL005
  FROM cmnt) obj1_3) v3
   
   
  I have created https://issues.apache.org/jira/browse/HIVE-4968 to address 
  this issue.
   
   
   
   
  On Sun, Jul 28, 2013 at 11:46 PM, wzc1...@gmail.com 
  (mailto:wzc1...@gmail.com) wrote:
   Hi:
   I attach the output of EXPLAIN, and the hive I use is compiled from trunk 
   and my hadoop version is 1.0.1. I use default hive configuration.  


   --  
   wzc1...@gmail.com (mailto:wzc1...@gmail.com)
   已使用 Sparrow (http://www.sparrowmailapp.com/?sig)

   已使用 Sparrow (http://www.sparrowmailapp.com/?sig)  

   在 2013年7月29日星期一,下午1:08,Yin Huai 写道:

Hi,
 
Can you also post the output of EXPLAIN? The execution plan may be 
helpful to locate the problem.
 
Thanks,
 
Yin
 
 
On Sun, Jul 28, 2013 at 8:06 PM, wzc1...@gmail.com 
(mailto:wzc1...@gmail.com) wrote:
 What I mean by not pass the testcase in HIVE-4650 is that I compile 
 the trunk code and run the query in HIVE-4650:  
 SELECT *
 FROM
   (SELECT VAL001 x1,
   VAL002 x2,
   VAL003 x3,
   VAL004 x4,
   VAL005 y
FROM
  (SELECT /*+ mapjoin(v2) */ (VAL001- mu1) * 1/(sd1) 
 VAL001,(VAL002- mu2) * 1/(sd2) VAL002,(VAL003- mu3) * 1/(sd3) 
 VAL003,(VAL004- mu4) * 1/(sd4) VAL004,(VAL005- mu5) * 1/(sd5) VAL005
   FROM
 (SELECT *
  FROM
(SELECT x1 VAL001,
x2 VAL002,
x3 VAL003,
x4 VAL004,
y VAL005
 FROM cmnt) obj1_3) v3
   JOIN
 (SELECT count(*) c,
 avg(VAL001) mu1,
 avg(VAL002) mu2,
 avg(VAL003) mu3,
 avg(VAL004) mu4,
 avg(VAL005) mu5,
 stddev_pop(VAL001) sd1,
 stddev_pop(VAL002) sd2,
 stddev_pop(VAL003) sd3,
 stddev_pop(VAL004) sd4,
 stddev_pop(VAL005) sd5
  FROM
(SELECT *
 FROM
   (SELECT x1 VAL001,
   x2 VAL002,
   x3 VAL003,
   x4 VAL004,
   y VAL005
FROM cmnt) obj1_3) v1) v2) obj1_7) obj1_6 ;
  
  
 and it still fail at the same place:
 …
 Diagnostic Messages for this Task:
 java.lang.RuntimeException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 

Re: 回复: BUG IN HIVE-4650 seems not fixed

2013-07-31 Thread Yin Huai
Seems it is another problem.
Can you try

SELECT *
FROM (SELECT VAL001 x1,
 VAL002 x2,
 VAL003 x3,
 VAL004 x4,
 VAL005 y
  FROM (SELECT /*+ mapjoin(v2) */ (VAL001- mu1) * 1/(sd1) VAL001,
   (VAL002- mu2) * 1/(sd2) VAL002,
   (VAL003- mu3) * 1/(sd3) VAL003,
   (VAL004- mu4) * 1/(sd4) VAL004,
   (VAL005- mu5) * 1/(sd5) VAL005
FROM (SELECT x1 VAL001,
 x2 VAL002,
 x3 VAL003,
 x4 VAL004,
 y VAL005
  FROM cmnt) v3
JOIN (SELECT count(*) c,
 avg(VAL001) mu1,
 avg(VAL002) mu2,
 avg(VAL003) mu3,
 avg(VAL004) mu4,
 avg(VAL005) mu5,
 stddev_pop(VAL001) sd1,
 stddev_pop(VAL002) sd2,
 stddev_pop(VAL003) sd3,
 stddev_pop(VAL004) sd4,
 stddev_pop(VAL005) sd5
  FROM (SELECT *
FROM (SELECT x1 VAL001,
 x2 VAL002,
 x3 VAL003,
 x4 VAL004,
 y VAL005
  FROM cmnt) obj1_3) v1) v2) obj1_7) obj1_6;

Also, cmnt in v3 will be used to create the hash table. Seems the part of
code in converting Join to MapJoin does not play well with this part of
your original query

SELECT *
 FROM
   (SELECT x1 VAL001,
   x2 VAL002,
   x3 VAL003,
   x4 VAL004,
   y VAL005
FROM cmnt) obj1_3) v3


I have created https://issues.apache.org/jira/browse/HIVE-4968 to address
this issue.




On Sun, Jul 28, 2013 at 11:46 PM, wzc1...@gmail.com wrote:

 Hi:
 I attach the output of EXPLAIN, and the hive I use is compiled from trunk
 and my hadoop version is 1.0.1. I use default hive configuration.


 --
 wzc1...@gmail.com
 已使用 Sparrow http://www.sparrowmailapp.com/?sig

 已使用 Sparrow http://www.sparrowmailapp.com/?sig

 在 2013年7月29日星期一,下午1:08,Yin Huai 写道:

 Hi,

 Can you also post the output of EXPLAIN? The execution plan may be helpful
 to locate the problem.

 Thanks,

 Yin


 On Sun, Jul 28, 2013 at 8:06 PM, wzc1...@gmail.com wrote:

 What I mean by not pass the testcase in HIVE-4650 is that I compile the
 trunk code and run the query in HIVE-4650:
 SELECT *
 FROM
   (SELECT VAL001 x1,
   VAL002 x2,
   VAL003 x3,
   VAL004 x4,
   VAL005 y
FROM
  (SELECT /*+ mapjoin(v2) */ (VAL001- mu1) * 1/(sd1) VAL001,(VAL002-
 mu2) * 1/(sd2) VAL002,(VAL003- mu3) * 1/(sd3) VAL003,(VAL004- mu4) *
 1/(sd4) VAL004,(VAL005- mu5) * 1/(sd5) VAL005
   FROM
 (SELECT *
  FROM
(SELECT x1 VAL001,
x2 VAL002,
x3 VAL003,
x4 VAL004,
y VAL005
 FROM cmnt) obj1_3) v3
   JOIN
 (SELECT count(*) c,
 avg(VAL001) mu1,
 avg(VAL002) mu2,
 avg(VAL003) mu3,
 avg(VAL004) mu4,
 avg(VAL005) mu5,
 stddev_pop(VAL001) sd1,
 stddev_pop(VAL002) sd2,
 stddev_pop(VAL003) sd3,
 stddev_pop(VAL004) sd4,
 stddev_pop(VAL005) sd5
  FROM
(SELECT *
 FROM
   (SELECT x1 VAL001,
   x2 VAL002,
   x3 VAL003,
   x4 VAL004,
   y VAL005
FROM cmnt) obj1_3) v1) v2) obj1_7) obj1_6 ;

 and it still fail at the same place:
 …
 Diagnostic Messages for this Task:
 java.lang.RuntimeException:
 org.apache.hadoop.hive.ql.metadata.HiveException:
 java.lang.NullPointerException
 at
 org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:162)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:416)
 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
 at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
 java.lang.NullPointerException
 at
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:198)
 at
 

Re: 回复: BUG IN HIVE-4650 seems not fixed

2013-07-31 Thread Yin Huai
I just uploaded a patch to https://issues.apache.org/jira/browse/HIVE-4968.
You can try it and see if the problem has been resolved for your query.


On Wed, Jul 31, 2013 at 11:21 AM, Yin Huai huaiyin@gmail.com wrote:

 Seems it is another problem.
 Can you try


 SELECT *
 FROM (SELECT VAL001 x1,
  VAL002 x2,
  VAL003 x3,
  VAL004 x4,
  VAL005 y
   FROM (SELECT /*+ mapjoin(v2) */ (VAL001- mu1) * 1/(sd1) VAL001,
(VAL002- mu2) * 1/(sd2) VAL002,
(VAL003- mu3) * 1/(sd3) VAL003,
(VAL004- mu4) * 1/(sd4) VAL004,
(VAL005- mu5) * 1/(sd5) VAL005
 FROM (SELECT x1 VAL001,

  x2 VAL002,
  x3 VAL003,
  x4 VAL004,
  y VAL005
   FROM cmnt) v3

 JOIN (SELECT count(*) c,
  avg(VAL001) mu1,
  avg(VAL002) mu2,
  avg(VAL003) mu3,
  avg(VAL004) mu4,
  avg(VAL005) mu5,
  stddev_pop(VAL001) sd1,
  stddev_pop(VAL002) sd2,
  stddev_pop(VAL003) sd3,
  stddev_pop(VAL004) sd4,
  stddev_pop(VAL005) sd5
   FROM (SELECT *
 FROM (SELECT x1 VAL001,
  x2 VAL002,
  x3 VAL003,
  x4 VAL004,
  y VAL005
   FROM cmnt) obj1_3) v1) v2) obj1_7) obj1_6;

 Also, cmnt in v3 will be used to create the hash table. Seems the part of
 code in converting Join to MapJoin does not play well with this part of
 your original query


 SELECT *
  FROM
(SELECT x1 VAL001,
x2 VAL002,
x3 VAL003,
x4 VAL004,
y VAL005
 FROM cmnt) obj1_3) v3


 I have created https://issues.apache.org/jira/browse/HIVE-4968 to address
 this issue.




 On Sun, Jul 28, 2013 at 11:46 PM, wzc1...@gmail.com wrote:

 Hi:
 I attach the output of EXPLAIN, and the hive I use is compiled from trunk
 and my hadoop version is 1.0.1. I use default hive configuration.


 --
 wzc1...@gmail.com
 已使用 Sparrow http://www.sparrowmailapp.com/?sig

 已使用 Sparrow http://www.sparrowmailapp.com/?sig

 在 2013年7月29日星期一,下午1:08,Yin Huai 写道:

 Hi,

 Can you also post the output of EXPLAIN? The execution plan may be
 helpful to locate the problem.

 Thanks,

 Yin


 On Sun, Jul 28, 2013 at 8:06 PM, wzc1...@gmail.com wrote:

 What I mean by not pass the testcase in HIVE-4650 is that I compile the
 trunk code and run the query in HIVE-4650:
 SELECT *
 FROM
   (SELECT VAL001 x1,
   VAL002 x2,
   VAL003 x3,
   VAL004 x4,
   VAL005 y
FROM
  (SELECT /*+ mapjoin(v2) */ (VAL001- mu1) * 1/(sd1) VAL001,(VAL002-
 mu2) * 1/(sd2) VAL002,(VAL003- mu3) * 1/(sd3) VAL003,(VAL004- mu4) *
 1/(sd4) VAL004,(VAL005- mu5) * 1/(sd5) VAL005
   FROM
 (SELECT *
  FROM
(SELECT x1 VAL001,
x2 VAL002,
x3 VAL003,
x4 VAL004,
y VAL005
 FROM cmnt) obj1_3) v3
   JOIN
 (SELECT count(*) c,
 avg(VAL001) mu1,
 avg(VAL002) mu2,
 avg(VAL003) mu3,
 avg(VAL004) mu4,
 avg(VAL005) mu5,
 stddev_pop(VAL001) sd1,
 stddev_pop(VAL002) sd2,
 stddev_pop(VAL003) sd3,
 stddev_pop(VAL004) sd4,
 stddev_pop(VAL005) sd5
  FROM
(SELECT *
 FROM
   (SELECT x1 VAL001,
   x2 VAL002,
   x3 VAL003,
   x4 VAL004,
   y VAL005
FROM cmnt) obj1_3) v1) v2) obj1_7) obj1_6 ;

 and it still fail at the same place:
 …
 Diagnostic Messages for this Task:
 java.lang.RuntimeException:
 org.apache.hadoop.hive.ql.metadata.HiveException:
 java.lang.NullPointerException
 at
 org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:162)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:416)
 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
 at org.apache.hadoop.mapred.Child.main(Child.java:249)
 

回复: BUG IN HIVE-4650 seems not fixed

2013-07-29 Thread wzc1989
Hi:
I attach the output of EXPLAIN, and the hive I use is compiled from trunk and 
my hadoop version is 1.0.1. I use default hive configuration.  


--  
wzc1...@gmail.com
已使用 Sparrow (http://www.sparrowmailapp.com/?sig)

已使用 Sparrow (http://www.sparrowmailapp.com/?sig)  

在 2013年7月29日星期一,下午1:08,Yin Huai 写道:

 Hi,
  
 Can you also post the output of EXPLAIN? The execution plan may be helpful to 
 locate the problem.
  
 Thanks,
  
 Yin
  
  
 On Sun, Jul 28, 2013 at 8:06 PM, wzc1...@gmail.com 
 (mailto:wzc1...@gmail.com) wrote:
  What I mean by not pass the testcase in HIVE-4650 is that I compile the 
  trunk code and run the query in HIVE-4650:  
  SELECT *
  FROM
(SELECT VAL001 x1,
VAL002 x2,
VAL003 x3,
VAL004 x4,
VAL005 y
 FROM
   (SELECT /*+ mapjoin(v2) */ (VAL001- mu1) * 1/(sd1) VAL001,(VAL002- 
  mu2) * 1/(sd2) VAL002,(VAL003- mu3) * 1/(sd3) VAL003,(VAL004- mu4) * 
  1/(sd4) VAL004,(VAL005- mu5) * 1/(sd5) VAL005
FROM
  (SELECT *
   FROM
 (SELECT x1 VAL001,
 x2 VAL002,
 x3 VAL003,
 x4 VAL004,
 y VAL005
  FROM cmnt) obj1_3) v3
JOIN
  (SELECT count(*) c,
  avg(VAL001) mu1,
  avg(VAL002) mu2,
  avg(VAL003) mu3,
  avg(VAL004) mu4,
  avg(VAL005) mu5,
  stddev_pop(VAL001) sd1,
  stddev_pop(VAL002) sd2,
  stddev_pop(VAL003) sd3,
  stddev_pop(VAL004) sd4,
  stddev_pop(VAL005) sd5
   FROM
 (SELECT *
  FROM
(SELECT x1 VAL001,
x2 VAL002,
x3 VAL003,
x4 VAL004,
y VAL005
 FROM cmnt) obj1_3) v1) v2) obj1_7) obj1_6 ;
   
   
  and it still fail at the same place:
  …
  Diagnostic Messages for this Task:
  java.lang.RuntimeException: 
  org.apache.hadoop.hive.ql.metadata.HiveException: 
  java.lang.NullPointerException
  at 
  org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:162)
  at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
  at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
  at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:416)
  at 
  org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
  at org.apache.hadoop.mapred.Child.main(Child.java:249)
  Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
  java.lang.NullPointerException
  at 
  org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:198)
  at 
  org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:212)
  at 
  org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1377)
  at 
  org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1381)
  at 
  org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1381)
  at 
  org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:611)
  at 
  org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
  ... 8 more
  Caused by: java.lang.NullPointerException
  at 
  org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:186)
  ... 14 more
   
   
  --  
  wzc1...@gmail.com (mailto:wzc1...@gmail.com)
  已使用 Sparrow (http://www.sparrowmailapp.com/?sig)
   
  已使用 Sparrow (http://www.sparrowmailapp.com/?sig)  
   
  在 2013年7月28日星期日,下午8:08,wzc1...@gmail.com (mailto:wzc1...@gmail.com) 写道:
   
   hi all:  

   We are currently testing hive 0.11 against our production environment and 
   run into some problems. Some of them are related to the param 
   hive.auto.convert.join.  
   We disable this param and some failed testcases passed. By searching in 
   hive jira issues I find that the patch in 
   HIVE-4650(https://issues.apache.org/jira/browse/HIVE-4650) may be helpful.
   I compile the newest code in trunk and try the failed testcase in 
   HIVE-4650, but it doesn't pass. It seems that this issue is not fixed 
   while it's closed.

   Am I missed something?

   --  
   wzc1...@gmail.com (mailto:wzc1...@gmail.com)
   已使用 Sparrow (http://www.sparrowmailapp.com/?sig)

   已使用 Sparrow (http://www.sparrowmailapp.com/?sig)  
   
  



explain.txt
Description: Binary data


BUG IN HIVE-4650 seems not fixed

2013-07-28 Thread wzc1989
hi all:  

We are currently testing hive 0.11 against our production environment and run 
into some problems. Some of them are related to the param 
hive.auto.convert.join.
We disable this param and some failed testcases passed. By searching in hive 
jira issues I find that the patch in 
HIVE-4650(https://issues.apache.org/jira/browse/HIVE-4650) may be helpful.
I compile the newest code in trunk and try the failed testcase in HIVE-4650, 
but it doesn't pass. It seems that this issue is not fixed while it's closed.

Am I missed something?

--  
wzc1...@gmail.com
已使用 Sparrow (http://www.sparrowmailapp.com/?sig)

已使用 Sparrow (http://www.sparrowmailapp.com/?sig)

回复: BUG IN HIVE-4650 seems not fixed

2013-07-28 Thread wzc1989
What I mean by not pass the testcase in HIVE-4650 is that I compile the trunk 
code and run the query in HIVE-4650:  
SELECT *
FROM
  (SELECT VAL001 x1,
  VAL002 x2,
  VAL003 x3,
  VAL004 x4,
  VAL005 y
   FROM
 (SELECT /*+ mapjoin(v2) */ (VAL001- mu1) * 1/(sd1) VAL001,(VAL002- mu2) * 
1/(sd2) VAL002,(VAL003- mu3) * 1/(sd3) VAL003,(VAL004- mu4) * 1/(sd4) 
VAL004,(VAL005- mu5) * 1/(sd5) VAL005
  FROM
(SELECT *
 FROM
   (SELECT x1 VAL001,
   x2 VAL002,
   x3 VAL003,
   x4 VAL004,
   y VAL005
FROM cmnt) obj1_3) v3
  JOIN
(SELECT count(*) c,
avg(VAL001) mu1,
avg(VAL002) mu2,
avg(VAL003) mu3,
avg(VAL004) mu4,
avg(VAL005) mu5,
stddev_pop(VAL001) sd1,
stddev_pop(VAL002) sd2,
stddev_pop(VAL003) sd3,
stddev_pop(VAL004) sd4,
stddev_pop(VAL005) sd5
 FROM
   (SELECT *
FROM
  (SELECT x1 VAL001,
  x2 VAL002,
  x3 VAL003,
  x4 VAL004,
  y VAL005
   FROM cmnt) obj1_3) v1) v2) obj1_7) obj1_6 ;


and it still fail at the same place:
…
Diagnostic Messages for this Task:
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.NullPointerException
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:162)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:198)
at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:212)
at 
org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1377)
at 
org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1381)
at 
org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1381)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:611)
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
... 8 more
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:186)
... 14 more


--  
wzc1...@gmail.com
已使用 Sparrow (http://www.sparrowmailapp.com/?sig)

已使用 Sparrow (http://www.sparrowmailapp.com/?sig)  

在 2013年7月28日星期日,下午8:08,wzc1...@gmail.com 写道:

 hi all:  
  
 We are currently testing hive 0.11 against our production environment and run 
 into some problems. Some of them are related to the param 
 hive.auto.convert.join.
 We disable this param and some failed testcases passed. By searching in hive 
 jira issues I find that the patch in 
 HIVE-4650(https://issues.apache.org/jira/browse/HIVE-4650) may be helpful.
 I compile the newest code in trunk and try the failed testcase in HIVE-4650, 
 but it doesn't pass. It seems that this issue is not fixed while it's closed.
  
 Am I missed something?
  
 --  
 wzc1...@gmail.com (mailto:wzc1...@gmail.com)
 已使用 Sparrow (http://www.sparrowmailapp.com/?sig)
  
 已使用 Sparrow (http://www.sparrowmailapp.com/?sig)  



Re: 回复: BUG IN HIVE-4650 seems not fixed

2013-07-28 Thread Yin Huai
Hi,

Can you also post the output of EXPLAIN? The execution plan may be helpful
to locate the problem.

Thanks,

Yin


On Sun, Jul 28, 2013 at 8:06 PM, wzc1...@gmail.com wrote:

 What I mean by not pass the testcase in HIVE-4650 is that I compile the
 trunk code and run the query in HIVE-4650:
 SELECT *
 FROM
   (SELECT VAL001 x1,
   VAL002 x2,
   VAL003 x3,
   VAL004 x4,
   VAL005 y
FROM
  (SELECT /*+ mapjoin(v2) */ (VAL001- mu1) * 1/(sd1) VAL001,(VAL002-
 mu2) * 1/(sd2) VAL002,(VAL003- mu3) * 1/(sd3) VAL003,(VAL004- mu4) *
 1/(sd4) VAL004,(VAL005- mu5) * 1/(sd5) VAL005
   FROM
 (SELECT *
  FROM
(SELECT x1 VAL001,
x2 VAL002,
x3 VAL003,
x4 VAL004,
y VAL005
 FROM cmnt) obj1_3) v3
   JOIN
 (SELECT count(*) c,
 avg(VAL001) mu1,
 avg(VAL002) mu2,
 avg(VAL003) mu3,
 avg(VAL004) mu4,
 avg(VAL005) mu5,
 stddev_pop(VAL001) sd1,
 stddev_pop(VAL002) sd2,
 stddev_pop(VAL003) sd3,
 stddev_pop(VAL004) sd4,
 stddev_pop(VAL005) sd5
  FROM
(SELECT *
 FROM
   (SELECT x1 VAL001,
   x2 VAL002,
   x3 VAL003,
   x4 VAL004,
   y VAL005
FROM cmnt) obj1_3) v1) v2) obj1_7) obj1_6 ;

 and it still fail at the same place:
 …
 Diagnostic Messages for this Task:
 java.lang.RuntimeException:
 org.apache.hadoop.hive.ql.metadata.HiveException:
 java.lang.NullPointerException
 at
 org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:162)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:416)
 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
 at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
 java.lang.NullPointerException
 at
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:198)
 at
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:212)
 at
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1377)
 at
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1381)
 at
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1381)
 at
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:611)
 at
 org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
 ... 8 more
 Caused by: java.lang.NullPointerException
 at
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:186)
 ... 14 more

 --
 wzc1...@gmail.com
 已使用 Sparrow http://www.sparrowmailapp.com/?sig

 已使用 Sparrow http://www.sparrowmailapp.com/?sig

 在 2013年7月28日星期日,下午8:08,wzc1...@gmail.com 写道:

  hi all:

 We are currently testing hive 0.11 against our production environment and
 run into some problems. Some of them are related to the param
 hive.auto.convert.join.
 We disable this param and some failed testcases passed. By searching in
 hive jira issues I find that the patch in HIVE-4650(
 https://issues.apache.org/jira/browse/HIVE-4650) may be helpful.
 I compile the newest code in trunk and try the failed testcase in
 HIVE-4650, but it doesn't pass. It seems that this issue is not fixed
 while it's closed.

 Am I missed something?

 --
 wzc1...@gmail.com
 已使用 Sparrow http://www.sparrowmailapp.com/?sig

 已使用 Sparrow http://www.sparrowmailapp.com/?sig





FW: a potential bug in HIVE/HADOOP ? -- MetaStore, createDatabase()

2011-12-14 Thread Bing Li

fyi
--- 11年12月14日,周三, Bing Li lib...@yahoo.com.cn 写道:

发件人: Bing Li lib...@yahoo.com.cn
主题: a potential bug in HIVE/HADOOP ? -- MetaStore, createDatabase()
收件人: hive dev list d...@hive.apache.org
日期: 2011年12月14日,周三,下午8:32

Hi, developers
When I ran Hive UT with the candidate build of Hive-0.8.0, I found that 
TestEmbeddedHiveMetaStore and TestRemoteHiveMetaStore always FAILED with ROOT 
account while PASS with NON-ROOT account.

I took a look at the source code of TestHiveMetaStore, and found that 

  fs.mkdirs(
  new Path(HiveConf.getVar(hiveConf, 
HiveConf.ConfVars.METASTOREWAREHOUSE) + /test),
  new FsPermission((short) 0));

 client.createDatabase(db);   // always create the db with ROOT 

Does HIVE UT only support NON-ROOT account? Otherwise, I think it maybe a 
potential defect/bug in HADOOP/HIVE.


Thanks,
- Bing