Re: percentile_approx slowness

2014-09-25 Thread j.barrett Strausser
Not an answer to your question, but you can compute approximate percentiles
with only the memory overhead of a single integer ( two integers if you
want better results)

http://link.springer.com/chapter/10.1007/978-3-642-40273-9_7

So you could pretty easily implement that above algorithm as a python UDF
and then have a reduce step that averages the results.





On Thu, Sep 25, 2014 at 3:06 PM, Kevin Weiler 
wrote:

>  Hi All,
>
>  I have a query that attempts to computer percentiles on some datasets
> that are well in excess of 100,000,000 rows and have thus opted to use
> percentile_approx as we are routinely overrunning the memory. I’m having
> trouble finding a threshold that I want to begin sampling. Before this
> dataset got so large, the maximum number of rows I would need to include in
> the percentile was about 1,000,000. I’ve tried using 1,000,000 as a
> sampling threshold, 100,000, and even the default 10,000. For some reason
> this query, that previously took about 20 minutes to run is now taking
> around 13 hours to complete (in the case of 100,000 as my sampling rate).
> Are there some hive settings I should be investigating to see if I can have
> this query complete in a reasonable time?
>
> --
>   *Kevin Weiler*
>
> IT
>  IMC Financial Markets | 233 S. Wacker Drive, Suite 4300 | Chicago, IL
> 60606 | http://imc-chicago.com/
>  Phone: +1 312-204-7439 | Fax: +1 312-244-3301 | E-Mail: 
> *kevin.wei...@imc-chicago.com
> *
>
>
> --
>
> The information in this e-mail is intended only for the person or entity
> to which it is addressed.
>
> It may contain confidential and /or privileged material. If someone other
> than the intended recipient should receive this e-mail, he / she shall not
> be entitled to read, disseminate, disclose or duplicate it.
>
> If you receive this e-mail unintentionally, please inform us immediately
> by "reply" and then delete it from your system. Although this information
> has been compiled with great care, neither IMC Financial Markets & Asset
> Management nor any of its related entities shall accept any responsibility
> for any errors, omissions or other inaccuracies in this information or for
> the consequences thereof, nor shall it be bound in any way by the contents
> of this e-mail or its attachments. In the event of incomplete or incorrect
> transmission, please return the e-mail to the sender and permanently delete
> this message and any attachments.
>
> Messages and attachments are scanned for all known viruses. Always scan
> attachments before opening them.
>



-- 


https://github.com/bearrito
@deepbearrito


percentile_approx slowness

2014-09-25 Thread Kevin Weiler
Hi All,

I have a query that attempts to computer percentiles on some datasets that are 
well in excess of 100,000,000 rows and have thus opted to use percentile_approx 
as we are routinely overrunning the memory. I’m having trouble finding a 
threshold that I want to begin sampling. Before this dataset got so large, the 
maximum number of rows I would need to include in the percentile was about 
1,000,000. I’ve tried using 1,000,000 as a sampling threshold, 100,000, and 
even the default 10,000. For some reason this query, that previously took about 
20 minutes to run is now taking around 13 hours to complete (in the case of 
100,000 as my sampling rate). Are there some hive settings I should be 
investigating to see if I can have this query complete in a reasonable time?

--
Kevin Weiler
IT
IMC Financial Markets | 233 S. Wacker Drive, Suite 4300 | Chicago, IL 60606 | 
http://imc-chicago.com/
Phone: +1 312-204-7439 | Fax: +1 312-244-3301 | E-Mail: 
kevin.wei...@imc-chicago.com




The information in this e-mail is intended only for the person or entity to 
which it is addressed.

It may contain confidential and /or privileged material. If someone other than 
the intended recipient should receive this e-mail, he / she shall not be 
entitled to read, disseminate, disclose or duplicate it.

If you receive this e-mail unintentionally, please inform us immediately by 
"reply" and then delete it from your system. Although this information has been 
compiled with great care, neither IMC Financial Markets & Asset Management nor 
any of its related entities shall accept any responsibility for any errors, 
omissions or other inaccuracies in this information or for the consequences 
thereof, nor shall it be bound in any way by the contents of this e-mail or its 
attachments. In the event of incomplete or incorrect transmission, please 
return the e-mail to the sender and permanently delete this message and any 
attachments.

Messages and attachments are scanned for all known viruses. Always scan 
attachments before opening them.


RE: Hiveserver2 crash with RStudio (using RJDBC)

2014-09-25 Thread Nathalie Blais
Hello Vaibhav,

Thanks a lot for your quick response!

I will grab a heapdump as soon as I have “the ok to crash the server” and 
attach it to this thread.  In the meantime, regarding our metastore, it looks 
like it is remote (excerpt from our hive-site.xml below):


  hive.metastore.local
  false


  hive.metastore.uris
  thrift://server_name:9083


  hive.metastore.client.socket.timeout
  300


  hive.metastore.warehouse.dir
  /user/hive/warehouse


  hive.warehouse.subdir.inherit.perms
  true


On a side note, the forum might have received my inquiry several times.  I had 
a bit of trouble sending it and I retried a few times; please disregard any 
dupes of this request.

Thanks!

-- Nathalie

From: Vaibhav Gumashta [mailto:vgumas...@hortonworks.com]
Sent: 25 septembre 2014 03:52
To: user@hive.apache.org
Subject: Re: Hiveserver2 crash with RStudio (using RJDBC)

Nathalie,

Can you grab a heapdump at the time the server crashes (export this to your 
environment: HADOOP_CLIENT_OPTS="-XX:+HeapDumpOnOutOfMemoryError 
-XX:HeapDumpPath= $HADOOP_CLIENT_OPTS".)? What type of 
metastore are you using with HiveServer2 - embedded (if you specify -hiveconf 
hive.metastore.uris=" " in the HiveServer2 startup command, it uses embedded 
metastore) or remote?

Thanks,
--Vaibhav

On Mon, Sep 22, 2014 at 10:55 AM, Nathalie Blais 
mailto:nathalie.bl...@ubisoft.com>> wrote:
Hello,

We are currently experiencing a severe reproducible hiveserver2 crash when 
using the RJDBC connector in RStudio (please refer to the description below for 
the detailed test case).  We have a hard time pinpointing the source of the 
problem and we are wondering whether this is a known issue or we have a glitch 
in our configuration; we would sincerely appreciate your input on this case.

Case
Severe Hiveserver2 crash when returning “a certain” volume of data (really not 
that big) to RStudio through RJDBC

Config Versions
Hadoop Distribution: Cloudera – cdh5.0.1p0.47
Hiverserver2: 0.12
RStudio: 0.98.1056
RJDBC: 0.2-4

How to Reproduce

1.   In a SQL client application (Aqua Data Studio was used for the purpose 
of this example), create Hive test table

a.   create table test_table_connection_crash(col1 string);

2.   Load data into table (data file attached)

a.   LOAD DATA INPATH '/user/test/testFile.txt' INTO TABLE 
test_table_connection_crash;

3.   Verify row count

a.   select count(*) nbRows from test_table_connection_crash;

b.  720 000 rows

4.   Display all rows

a.   select * from test_table_connection_crash order by col1 desc

b.  All the rows are returned by the Map/Reduce to the client and displayed 
properly in the interface

5.   Open RStudio

6.   Create connection to Hive

a.   library(RJDBC)

b.  drv <- JDBC(driverClass="org.apache.hive.jdbc.HiveDriver", 
classPath=list.files("D:/myJavaDriversFolderFromClusterInstall/", 
pattern="jar$", full.names=T), identifier.quote="`")

c.   conn <- dbConnect(drv, 
"jdbc:hive2://server_name:1/default;ssl=true;sslTrustStore=C:/Progra~1/Java/jdk1.7.0_60/jre/lib/security/cacerts;trustStorePassword=pswd",
 "user", "password")

7.   Verify connection with a small query

a.   r <- dbGetQuery(conn, "select * from test_table_connection_crash order 
by col1 desc limit 100")

b.  print(r)

c.   100 rows are returned to RStudio and properly displayed in the console 
interface

8.   Remove the limit and try the original query (as performed in the SQL 
client application)

a.   r <- dbGetQuery(conn, "select * from test_table_connection_crash order 
by col1 desc")

b.  Query starts running

c.   *** Cluster crash ***

Worst comes to worst, in the eventuality that RStudio desktop client cannot 
handle such an amount of data, we might expect the desktop application to 
crash; not the whole hiveserver2.

Please let us know whether or not you are aware of any issues of the kind.  
Also, please do not hesitate to request any configuration file you might need 
to examine.

Thank you very much!

Best regards,

Nathalie


[dna_signature]

Nathalie Blais
B.I. Developer | Technology Group
Ubisoft Montreal





CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader of 
this message is not the intended recipient, you are hereby notified that any 
printing, copying, dissemination, distribution, disclosure or forwarding of 
this communication is strictly prohibited. If you have received this 
communication in error, please contact the sender immediately and delete it 
from your system. Thank You.


Re: oozie installation error

2014-09-25 Thread pratik khadloya
I see that you are getting an OOM error. I had the same error when i was
trying to build yesterday.
I got a little further by setting MAVEN_OPTS=-Xmx512m.

# The command i was using...
$ MAVEN_OPTS=-Xmx512m mvn clean install -DskipTests -Phadoop-1


~Pratik

On Thu, Sep 25, 2014 at 7:14 AM, Rahul Channe 
wrote:

> Hi Muthu,
>
> I tried running the command for hadoop-1 but it failed with below error
> ,it seems that the hadoop-1 profile does not exists in POM.xml , can i
> manually add it ?
>
> [INFO] Apache Oozie Core . FAILURE
> [59.151s]
> [INFO] Apache Oozie Docs . SKIPPED
> [INFO] Apache Oozie Share Lib Pig  SKIPPED
> [INFO] Apache Oozie Share Lib Hive ... SKIPPED
> [INFO] Apache Oozie Share Lib Sqoop .. SKIPPED
> [INFO] Apache Oozie Share Lib Streaming .. SKIPPED
> [INFO] Apache Oozie Share Lib Distcp . SKIPPED
> [INFO] Apache Oozie WebApp ... SKIPPED
> [INFO] Apache Oozie Examples . SKIPPED
> [INFO] Apache Oozie Share Lib  SKIPPED
> [INFO] Apache Oozie Tools  SKIPPED
> [INFO] Apache Oozie MiniOozie  SKIPPED
> [INFO] Apache Oozie Distro ... SKIPPED
> [INFO]
> 
> [INFO] BUILD FAILURE
> [INFO]
> 
> [INFO] Total time: 8:26.792s
> [INFO] Finished at: Thu Sep 25 10:04:30 EDT 2014
> [INFO] Final Memory: 58M/142M
> [INFO]
> 
> [WARNING] The requested profile "hadoop-1" could not be activated because
> it does not exist.
> [ERROR] Failed to execute goal
> org.apache.maven.plugins:maven-jar-plugin:2.3.1:jar (default-jar) on
> project oozie-core: Error assembling JAR: Failed to read filesystem
> attributes for: /home/user/oozie/core/pom.xml: Failed to retrieve numeric
> file attributes using: '/bin/sh -c ls -1nlad
> /home/user/oozie/core/pom.xml': Error while executing process. Cannot run
> program "/bin/sh": java.io.IOException: error=12, Cannot allocate memory ->
> [Help 1]
> [ERROR]
> [ERROR] To see the full stack trace of the errors, re-run Maven with the
> -e switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
>
>
>
> Later i changed the command to hadoop-2 but failed at below stage
>
> [INFO] Apache Oozie Share Lib HCatalog ... SUCCESS [9.909s]
> [INFO] Apache Oozie Core . FAILURE
> [22.937s]
> [INFO] Apache Oozie Docs . SKIPPED
> [INFO] Apache Oozie Share Lib Pig  SKIPPED
> [INFO] Apache Oozie Share Lib Hive ... SKIPPED
> [INFO] Apache Oozie Share Lib Sqoop .. SKIPPED
> [INFO] Apache Oozie Share Lib Streaming .. SKIPPED
> [INFO] Apache Oozie Share Lib Distcp . SKIPPED
> [INFO] Apache Oozie WebApp ... SKIPPED
> [INFO] Apache Oozie Examples . SKIPPED
> [INFO] Apache Oozie Share Lib  SKIPPED
> [INFO] Apache Oozie Tools  SKIPPED
> [INFO] Apache Oozie MiniOozie  SKIPPED
> [INFO] Apache Oozie Distro ... SKIPPED
> [INFO]
> 
> [INFO] BUILD FAILURE
> [INFO]
> 
> [INFO] Total time: 1:31.348s
> [INFO] Finished at: Thu Sep 25 10:11:20 EDT 2014
> [INFO] Final Memory: 77M/185M
> [INFO]
> 
> [ERROR] Failed to execute goal
> org.apache.maven.plugins:maven-jar-plugin:2.3.1:jar (default-jar) on
> project oozie-core: Error assembling JAR: Failed to read filesystem
> attributes for: /home/user/oozie/core/pom.xml: Failed to retrieve numeric
> file attributes using: '/bin/sh -c ls -1nlad
> /home/user/oozie/core/pom.xml': Error while executing process. Cannot run
> program "/bin/sh": java.io.IOException: error=12, Cannot allocate memory ->
> [Help 1]
> [ERROR]
> [ERROR] To see the full stack trace of the errors, re-run Maven with the
> -e switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR]
> [ERROR] For more information about the errors and possible solutions,
> please read the following articles:
> [ERROR] [Help 1]
> http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
> [ERROR]
> [ERROR] After correcting the problems, you can resume the build with the
> command
>
>
> On Thu, Sep 25, 2014 at 7:42 AM, Muthu Pandi  wrote:
>
>> replace hadoop-2 

Re: oozie installation error

2014-09-25 Thread Rahul Channe
Hi Muthu,

I tried running the command for hadoop-1 but it failed with below error ,it
seems that the hadoop-1 profile does not exists in POM.xml , can i manually
add it ?

[INFO] Apache Oozie Core . FAILURE [59.151s]
[INFO] Apache Oozie Docs . SKIPPED
[INFO] Apache Oozie Share Lib Pig  SKIPPED
[INFO] Apache Oozie Share Lib Hive ... SKIPPED
[INFO] Apache Oozie Share Lib Sqoop .. SKIPPED
[INFO] Apache Oozie Share Lib Streaming .. SKIPPED
[INFO] Apache Oozie Share Lib Distcp . SKIPPED
[INFO] Apache Oozie WebApp ... SKIPPED
[INFO] Apache Oozie Examples . SKIPPED
[INFO] Apache Oozie Share Lib  SKIPPED
[INFO] Apache Oozie Tools  SKIPPED
[INFO] Apache Oozie MiniOozie  SKIPPED
[INFO] Apache Oozie Distro ... SKIPPED
[INFO]

[INFO] BUILD FAILURE
[INFO]

[INFO] Total time: 8:26.792s
[INFO] Finished at: Thu Sep 25 10:04:30 EDT 2014
[INFO] Final Memory: 58M/142M
[INFO]

[WARNING] The requested profile "hadoop-1" could not be activated because
it does not exist.
[ERROR] Failed to execute goal
org.apache.maven.plugins:maven-jar-plugin:2.3.1:jar (default-jar) on
project oozie-core: Error assembling JAR: Failed to read filesystem
attributes for: /home/user/oozie/core/pom.xml: Failed to retrieve numeric
file attributes using: '/bin/sh -c ls -1nlad
/home/user/oozie/core/pom.xml': Error while executing process. Cannot run
program "/bin/sh": java.io.IOException: error=12, Cannot allocate memory ->
[Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.



Later i changed the command to hadoop-2 but failed at below stage

[INFO] Apache Oozie Share Lib HCatalog ... SUCCESS [9.909s]
[INFO] Apache Oozie Core . FAILURE [22.937s]
[INFO] Apache Oozie Docs . SKIPPED
[INFO] Apache Oozie Share Lib Pig  SKIPPED
[INFO] Apache Oozie Share Lib Hive ... SKIPPED
[INFO] Apache Oozie Share Lib Sqoop .. SKIPPED
[INFO] Apache Oozie Share Lib Streaming .. SKIPPED
[INFO] Apache Oozie Share Lib Distcp . SKIPPED
[INFO] Apache Oozie WebApp ... SKIPPED
[INFO] Apache Oozie Examples . SKIPPED
[INFO] Apache Oozie Share Lib  SKIPPED
[INFO] Apache Oozie Tools  SKIPPED
[INFO] Apache Oozie MiniOozie  SKIPPED
[INFO] Apache Oozie Distro ... SKIPPED
[INFO]

[INFO] BUILD FAILURE
[INFO]

[INFO] Total time: 1:31.348s
[INFO] Finished at: Thu Sep 25 10:11:20 EDT 2014
[INFO] Final Memory: 77M/185M
[INFO]

[ERROR] Failed to execute goal
org.apache.maven.plugins:maven-jar-plugin:2.3.1:jar (default-jar) on
project oozie-core: Error assembling JAR: Failed to read filesystem
attributes for: /home/user/oozie/core/pom.xml: Failed to retrieve numeric
file attributes using: '/bin/sh -c ls -1nlad
/home/user/oozie/core/pom.xml': Error while executing process. Cannot run
program "/bin/sh": java.io.IOException: error=12, Cannot allocate memory ->
[Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions,
please read the following articles:
[ERROR] [Help 1]
http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the
command


On Thu, Sep 25, 2014 at 7:42 AM, Muthu Pandi  wrote:

> replace hadoop-2 with hadoop-1
>
>
>
> *RegardsMuthupandi.K*
>
>  Think before you print.
>
>
>
> On Thu, Sep 25, 2014 at 4:48 PM, Rahul Channe 
> wrote:
>
>> Hi Muthu,
>>
>> I am trying to build oozie against hadoop 1
>>
>>
>> On Thursday, September 25, 2014, Muthu Pandi  wrote:
>>
>>> Hi Rahul
>>>
>>> Distro error may occur while using this command  bin/mkdistro.sh
>>> -DskipTests instead of that use
>>>
>>> mvn clean package assembly:single -P hadoop-2 -DskipTests
>>>
>>> if you are building against hadoop 2
>>

query with FIST_VALUE/LAST_VALUE functions keep running forever

2014-09-25 Thread Dima Fadeyev

Hello everyone,

I'm trying to run a query on an 8 node cluster with hive-0.13 (MapR 3.1.1):

SELECT FIRST_VALUE(col_a) OVER (PARTITION BY col_b ORDER BY col_c) FROM 
test;


If any partition is over 3 rows, the reduce phase of my query keeps 
running forever (until job is being killed by JobTracker).


Is this normal behavior? A normal ORDER BY on a table of 7 millions of 
rows takes about 70 seconds to complete on the same cluster.


Thanks and best regards,
--

AVISO CONFIDENCIAL\nEste correo y la información contenida o adjunta al mismo 
es privada y confidencial y va dirigida exclusivamente a su destinatario. 
Pragsis informa a quien pueda haber recibido este correo por error que contiene 
información confidencial cuyo uso, copia, reproducción o distribución está 
expresamente prohibida. Si no es Vd. el destinatario del mismo y recibe este 
correo por error, le rogamos lo ponga en conocimiento del emisor y proceda a su 
eliminación sin copiarlo, imprimirlo o utilizarlo de ningún 
modo.\nCONFIDENTIALITY WARNING.\nThis message and the information contained in 
or attached to it are private and confidential and intended exclusively for the 
addressee. Pragsis informs to whom it may receive it in error that it contains 
privileged information and its use, copy, reproduction or distribution is 
prohibited. If you are not an intended recipient of this E-mail, please notify 
the sender, delete it and do not read, act upon, print, disclose, copy, reta
in or redistribute any portion of this E-mail.

Re: oozie installation error

2014-09-25 Thread Muthu Pandi
replace hadoop-2 with hadoop-1



*RegardsMuthupandi.K*

 Think before you print.



On Thu, Sep 25, 2014 at 4:48 PM, Rahul Channe 
wrote:

> Hi Muthu,
>
> I am trying to build oozie against hadoop 1
>
>
> On Thursday, September 25, 2014, Muthu Pandi  wrote:
>
>> Hi Rahul
>>
>> Distro error may occur while using this command  bin/mkdistro.sh
>> -DskipTests instead of that use
>>
>> mvn clean package assembly:single -P hadoop-2 -DskipTests
>>
>> if you are building against hadoop 2
>>
>> it resolved the distro error.
>>
>> For more info follow "http://gauravkohli.com/category/oozie/";
>>
>>
>>
>> *RegardsMuthupandi.K*
>>
>>  Think before you print.
>>
>>
>>
>> On Wed, Sep 24, 2014 at 11:22 PM, Rahul Channe 
>> wrote:
>>
>>> hi All,
>>>
>>> I am trying to install oozie and getting following error. Any input is
>>> appreciated
>>>
>>> [INFO] BUILD FAILURE
>>> [INFO]
>>> 
>>> [INFO] Total time: 18.326s
>>> [INFO] Finished at: Wed Sep 24 13:45:13 EDT 2014
>>> [INFO] Final Memory: 26M/64M
>>> [INFO]
>>> 
>>> [ERROR] Failed to execute goal
>>> org.apache.maven.plugins:maven-jar-plugin:2.3.1:jar (default-jar) on
>>> project oozie-client: Error assembling JAR: Failed to read filesystem
>>> attributes for: /home/user/oozie/client/pom.xml: Failed to retrieve numeric
>>> file attributes using: '/bin/sh -c ls -1nlad
>>> /home/user/oozie/client/pom.xml': Error while executing process. Cannot run
>>> program "/bin/sh": java.io.IOException: error=12, Cannot allocate memory ->
>>> [Help 1]
>>> [ERROR]
>>> [ERROR] To see the full stack trace of the errors, re-run Maven with the
>>> -e switch.
>>> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
>>> [ERROR]
>>> [ERROR] For more information about the errors and possible solutions,
>>> please read the following articles:
>>> [ERROR] [Help 1]
>>> http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
>>> [ERROR]
>>> [ERROR] After correcting the problems, you can resume the build with the
>>> command
>>> [ERROR]   mvn  -rf :oozie-client
>>>
>>> ERROR, Oozie distro creation failed
>>>
>>>
>>


Re: oozie installation error

2014-09-25 Thread Rahul Channe
Hi Muthu,

I am trying to build oozie against hadoop 1

On Thursday, September 25, 2014, Muthu Pandi  wrote:

> Hi Rahul
>
> Distro error may occur while using this command  bin/mkdistro.sh
> -DskipTests instead of that use
>
> mvn clean package assembly:single -P hadoop-2 -DskipTests
>
> if you are building against hadoop 2
>
> it resolved the distro error.
>
> For more info follow "http://gauravkohli.com/category/oozie/";
>
>
>
> *RegardsMuthupandi.K*
>
>  Think before you print.
>
>
>
> On Wed, Sep 24, 2014 at 11:22 PM, Rahul Channe  > wrote:
>
>> hi All,
>>
>> I am trying to install oozie and getting following error. Any input is
>> appreciated
>>
>> [INFO] BUILD FAILURE
>> [INFO]
>> 
>> [INFO] Total time: 18.326s
>> [INFO] Finished at: Wed Sep 24 13:45:13 EDT 2014
>> [INFO] Final Memory: 26M/64M
>> [INFO]
>> 
>> [ERROR] Failed to execute goal
>> org.apache.maven.plugins:maven-jar-plugin:2.3.1:jar (default-jar) on
>> project oozie-client: Error assembling JAR: Failed to read filesystem
>> attributes for: /home/user/oozie/client/pom.xml: Failed to retrieve numeric
>> file attributes using: '/bin/sh -c ls -1nlad
>> /home/user/oozie/client/pom.xml': Error while executing process. Cannot run
>> program "/bin/sh": java.io.IOException: error=12, Cannot allocate memory ->
>> [Help 1]
>> [ERROR]
>> [ERROR] To see the full stack trace of the errors, re-run Maven with the
>> -e switch.
>> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
>> [ERROR]
>> [ERROR] For more information about the errors and possible solutions,
>> please read the following articles:
>> [ERROR] [Help 1]
>> http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
>> [ERROR]
>> [ERROR] After correcting the problems, you can resume the build with the
>> command
>> [ERROR]   mvn  -rf :oozie-client
>>
>> ERROR, Oozie distro creation failed
>>
>>
>


Testing hive0.13 for transactions

2014-09-25 Thread supriya

Hi All

I am trying to test the transaction feature, especially compaction of  
Hive 0.13.
I am getting a nullpointer exception at  
org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.heartbeat(DbTxnManager.java:244) while trying to insert into the bucketed table. I have mentioned below step-by-step what I am doing. Please guide me as to where I am going  
wrong.


hive> CREATE EXTERNAL TABLE EMP (ID INT, NAME STRING, COUNTRY STRING,  
VAR STRING)

> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '\t'
> LINES TERMINATED BY '\n'
> LOCATION '/tmp/employees';

hive> SELECT * FROM EMP;
OK
1   A   US  x
2   B   UK  y
3   C   AUS 1
4   D   UK  2
5   E   US  1
6   F   UK  2
7   G   AUS x
8   H   IND y
9   I   US  x
10  J   UK  y

hive> CREATE EXTERNAL TABLE BUCKET_EMP (ID INT, NAME STRING, VAR STRING)
> PARTITIONED BY (COUNTRY STRING)
> CLUSTERED BY(VAR) INTO 3 BUCKETS
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '\t'
> LINES TERMINATED BY '\n'
> STORED AS ORC
> LOCATION '/tmp/bucket_emp';

hive> SET hive.exec.dynamic.partition = true;
hive> SET hive.exec.dynamic.partition.mode = nonstrict;
hive> INSERT OVERWRITE TABLE BUCKET_EMP
> PARTITION(COUNTRY)
> SELECT ID, NAME , VAR, COUNTRY
> FROM EMP;

hive> SELECT * FROM BUCKET_EMP;
OK
7   G   x   AUS
3   C   1   AUS
8   H   y   IND
10  J   y   UK
2   B   y   UK
6   F   2   UK
4   D   2   UK
9   I   x   US
1   A   x   US
5   E   1   US

hive> SET hive.txn.manager = org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
hive> SET hive.compactor.initiator.on = true;
hive> SET hive.compactor.worker.threads = 3;
hive> SET hive.compactor.check.interval = 300;
hive> SET hive.compactor.delta.num.threshold = 1;

hive> INSERT OVERWRITE TABLE BUCKET_EMP
> PARTITION(COUNTRY)
> SELECT ID, NAME,
> CASE WHEN VAR = '1' THEN 'X' WHEN VAR = '2' THEN 'Y' END AS VAR, COUNTRY
> FROM EMP;

java.lang.NullPointerException
at  
org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.heartbeat(DbTxnManager.java:244)
at  
org.apache.hadoop.hive.ql.exec.Heartbeater.heartbeat(Heartbeater.java:79)
at  
org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:242)
at  
org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:547)
at  
org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:426)
at  
org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:136)

at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
at  
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)

at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1508)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1275)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1093)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:916)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:906)
at  
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268)

at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
at  
org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423)
at  
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:793)

at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at  
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at  
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Ended Job = job_1411574868628_0015 with exception  
'java.lang.NullPointerException(null)'


Thanks
Supriya




Re: Getting access to hadoop output from Hive JDBC session

2014-09-25 Thread Vaibhav Gumashta
Alex,

This is probably what you are looking for: Beeline should have an option
for user to see the query progress (
https://issues.apache.org/jira/browse/HIVE-7615).

Thanks,
--Vaibhav

On Mon, Aug 11, 2014 at 11:34 PM, Alexander Kolbasov 
wrote:

>  Hello,
>
>  I am switching from Hive 0.9 to Hive 0.12 and decided to start using
> Hive metadata server mode. As it turns out, Hive1 JDBC driver connected as
> "jdbc:hive://" only works via direct access to the metastore database. The
> Hive2 driver connected as "jdbc:hive2://" does work with the remote Hive
> metastore server, but there is another serious difference in behavior. When
> I was using Hive1 driver I saw Hadoop output - the information about Hive
> job ID and the usual Hadoop output showing percentages of map and reduce
> done. The Hive2 driver silently waited for map/reduce to complete and just
> produced the result.
>
>  As I can see, both Hive itself and beeline are able to get the same
> Hadoop output as I was getting with Hive1 driver, so it should be somehow
> possible but it isn't clear how they do this. Can someone suggest the way
> to get Hadoop output with Hive2 JDBC driver?
>
>  Thanks for any help!
>
>  - Alex
>
>
>
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: trouble starting hiveserver2

2014-09-25 Thread Vaibhav Gumashta
Hi Praveen,

hive.server2.servermode is not a valid hive config. Please
use hive.server2.transport.mode to specify which transport mode you'd like
to use for sending thrift messages between the client and the server. Can
you turn the debug logging on to see if you get any more info?

Thanks,
--Vaibhav

On Thu, Sep 18, 2014 at 1:22 AM,  wrote:

>   Hi,
>
>  I am unable to start service hiveserver2 on windows. I am starting this
> service to configure ODBC connection to Excel.
>
>  I am issuing the following command to start the service:
> > *hive.cmd --service hiveserver2*
>
>  It is not throwing any errors. When I am running with "echo on" option.
> I can see its trying to run following command
>
>  c:\hive\bin>*call C:\hadoop\bin\hadoop.cmd jar
> C:\hive\lib\hive-service-0.13.1.ja*
> *r org.apache.hive.service.server.HiveServer2*
>
>  Neither script exits nor I get any error messages after this.
>
>  Following is my *hive-site.xml* settings:
>
>  
>   hive.server2.servermode
>   *thrift*
>   Server  mode. "thrift" or "http".
> 
>
>   property>
>   hive.server2.transport.mode
>   binary
>   Server transport mode. "binary" or "http".
> 
>
>   
>   hive.server2.thrift.port
>   1
> 
>
>  
>   hive.server2.thrift.bind.host
>   localhost
>  
>
>  
>   hive.server2.authentication
>   NONE
> 
>
>  Regards,
> Praveen.
>
> The information contained in this electronic message and any attachments
> to this message are intended for the exclusive use of the addressee(s) and
> may contain proprietary, confidential or privileged information. If you are
> not the intended recipient, you should not disseminate, distribute or copy
> this e-mail. Please notify the sender immediately and destroy all copies of
> this message and any attachments.
>
> WARNING: Computer viruses can be transmitted via email. The recipient
> should check this email and any attachments for the presence of viruses.
> The company accepts no liability for any damage caused by any virus
> transmitted by this email.
>
> www.wipro.com
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Hiveserver2 crash with RStudio (using RJDBC)

2014-09-25 Thread Vaibhav Gumashta
Nathalie,

Can you grab a heapdump at the time the server crashes (export this to your
environment: HADOOP_CLIENT_OPTS="-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath= $HADOOP_CLIENT_OPTS".)? What type of
metastore are you using with HiveServer2 - embedded (if you specify
-hiveconf hive.metastore.uris=" " in the HiveServer2 startup command, it
uses embedded metastore) or remote?

Thanks,
--Vaibhav

On Mon, Sep 22, 2014 at 10:55 AM, Nathalie Blais  wrote:

>  Hello,
>
>
>
> We are currently experiencing a severe reproducible hiveserver2 crash when
> using the RJDBC connector in RStudio (please refer to the description below
> for the detailed test case).  We have a hard time pinpointing the source of
> the problem and we are wondering whether this is a known issue or we have a
> glitch in our configuration; we would sincerely appreciate your input on
> this case.
>
>
>
> Case
>
> Severe Hiveserver2 crash when returning “a certain” volume of data (really
> not that big) to RStudio through RJDBC
>
>
>
> Config Versions
>
> Hadoop Distribution: Cloudera – *cdh5.0.1p0.47*
>
> Hiverserver2: *0.12*
>
> RStudio: *0.98.1056*
>
> RJDBC: *0.2-4*
>
>
>
> How to Reproduce
>
> 1.   In a SQL client application (Aqua Data Studio was used for the
> purpose of this example), create Hive test table
>
> a.   create table test_table_connection_crash(col1 string);
>
> 2.   Load data into table (data file attached)
>
> a.   LOAD DATA INPATH '/user/test/testFile.txt' INTO TABLE
> test_table_connection_crash;
>
> 3.   Verify row count
>
> a.   select count(*) nbRows from test_table_connection_crash;
>
> b.  720 000 rows
>
> 4.   Display all rows
>
> a.   select * from test_table_connection_crash order by col1 desc
>
> b.  All the rows are returned by the Map/Reduce to the client and
> displayed properly in the interface
>
> 5.   Open RStudio
>
> 6.   Create connection to Hive
>
> a.   library(RJDBC)
>
> b.  drv <- JDBC(driverClass="org.apache.hive.jdbc.HiveDriver",
> classPath=list.files("D:/myJavaDriversFolderFromClusterInstall/",
> pattern="jar$", full.names=T), identifier.quote="`")
>
> c.   conn <- dbConnect(drv,
> "jdbc:hive2://server_name:1/default;ssl=true;sslTrustStore=C:/Progra~1/Java/jdk1.7.0_60/jre/lib/security/cacerts;trustStorePassword=pswd",
> "user", "password")
>
> 7.   Verify connection with a small query
>
> a.   r <- dbGetQuery(conn, "select * from test_table_connection_crash
> order by col1 desc limit 100")
>
> b.  print(r)
>
> c.   100 rows are returned to RStudio and properly displayed in the
> console interface
>
> 8.   Remove the limit and try the original query (as performed in the
> SQL client application)
>
> a.   r <- dbGetQuery(conn, "select * from test_table_connection_crash
> order by col1 desc")
>
> b.  Query starts running
>
> c.   *** Cluster crash ***
>
>
>
> Worst comes to worst, in the eventuality that RStudio desktop client
> cannot handle such an amount of data, we might expect the desktop
> application to crash; not the whole hiveserver2.
>
>
>
> Please let us know whether or not you are aware of any issues of the
> kind.  Also, please do not hesitate to request any configuration file you
> might need to examine.
>
>
>
> Thank you very much!
>
>
>
> Best regards,
>
>
>
> Nathalie
>
>
>
>
>
> [image: dna_signature]
>
> *Nathalie Blais*
>
> B.I. Developer | Technology Group
>
> Ubisoft Montreal
>
>
>
>
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


hue not writing data to sql tables

2014-09-25 Thread siva kumar
Hi folks,
  I need to populate hive query history in sql table.Im quering
those using hue. My hue default database is sqlite3. When i say '.tables'
it displayed me list of tables where i can find
auth_user,beeswax_queryhistory tables and many other tables as well.
But,when i say 'select username from auth_user' , no values r displayed.
Even, i tried configuring sql as my hue database and here is the
configuartion properties.
 engine=mysql
host=localhost
port=3306
user=root
password=
name=hue.
But,i can only find the tables mentioned above and no data is populated in
the respective tables when i query or create users. Im working with CM
VMware. Is there any other configuartions to populate the data. I really
find tough time . Any quick response is greatly appreaciated.



Thanks and regards,
sivakumar.c