To be specific about "no error message": the logs written in the logs
directory near the time of the crash are nearly identical to those of a
process that got much further on a machine with a configuration that I do
not know how to reproduce. The one that ended earlier has output like:

Creating /test-warehouse HDFS directory (logging to
/home/ubuntu/Impala/logs/data_loading/create-test-warehouse-dir.log)...
    OK (Took: 0 min 2 sec)
Derived params for create-load-data.sh:
EXPLORATION_STRATEGY=exhaustive
SKIP_METADATA_LOAD=0
SKIP_SNAPSHOT_LOAD=0
SNAPSHOT_FILE=
CM_HOST=
REMOTE_LOAD=
Starting Impala cluster (logging to
/home/ubuntu/Impala/logs/data_loading/start-impala-cluster.log)...
    FAILED (Took: 0 min 11 sec)
    '/home/ubuntu/Impala/bin/start-impala-cluster.py
--log_dir=/home/ubuntu/Impala/logs/data_loading -s 3' failed. Tail of log:
Log for command '/home/ubuntu/Impala/bin/start-impala-cluster.py
--log_dir=/home/ubuntu/Impala/logs/data_loading -s 3'
Starting State Store logging to
/home/ubuntu/Impala/logs/data_loading/statestored.INFO
Starting Catalog Service logging to
/home/ubuntu/Impala/logs/data_loading/catalogd.INFO
Error starting cluster: Unable to start catalogd. Check log or file
permissions for more details.
Error in /home/ubuntu/Impala/testdata/bin/create-load-data.sh at line 48:
LOAD_DATA_ARGS=""
+ cleanup
+ rm -rf /tmp/tmp.HVkbPNl08R


The one that got further in the process (and I think may be dying due to a
spurious out-of-disk failure that I am putting on the back-burner for the
moment) has the following output:

Creating /test-warehouse HDFS directory (logging to
/home/ubuntu/Impala/logs/data_loading/create-test-warehouse-dir.log)...
    OK (Took: 0 min 2 sec)
Derived params for create-load-data.sh:
EXPLORATION_STRATEGY=exhaustive
SKIP_METADATA_LOAD=0
SKIP_SNAPSHOT_LOAD=0
SNAPSHOT_FILE=
CM_HOST=
REMOTE_LOAD=
Starting Impala cluster (logging to
/home/ubuntu/Impala/logs/data_loading/start-impala-cluster.log)...
    OK (Took: 0 min 11 sec)
Setting up HDFS environment (logging to
/home/ubuntu/Impala/logs/data_loading/setup-hdfs-env.log)...
    OK (Took: 0 min 8 sec)
Loading custom schemas (logging to
/home/ubuntu/Impala/logs/data_loading/load-custom-schemas.log)...
    OK (Took: 0 min 35 sec)
Loading functional-query data (logging to
/home/ubuntu/Impala/logs/data_loading/load-functional-query.log)...
    OK (Took: 37 min 14 sec)
Loading TPC-H data (logging to
/home/ubuntu/Impala/logs/data_loading/load-tpch.log)...
    OK (Took: 14 min 11 sec)
Loading nested data (logging to
/home/ubuntu/Impala/logs/data_loading/load-nested.log)...
    OK (Took: 3 min 41 sec)
Loading TPC-DS data (logging to
/home/ubuntu/Impala/logs/data_loading/load-tpcds.log)...
    FAILED (Took: 5 min 50 sec)
    'load-data tpcds core' failed. Tail of log:
ss_net_paid_inc_tax,
ss_net_profit,
ss_sold_date_sk
from store_sales_unpartitioned
WHERE ss_sold_date_sk < 2451272
distribute by ss_sold_date_sk
INFO  : Query ID =
ubuntu_20170729150909_583df9cf-e54b-44bf-a104-ef5e690cfa0d
INFO  : Total jobs = 1
INFO  : Launching Job 1 out of 1
INFO  : Starting task [Stage-1:MAPRED] in serial mode
INFO  : Number of reduce tasks not specified. Estimated from input data
size: 2
INFO  : In order to change the average load for a reducer (in bytes):
INFO  :   set hive.exec.reducers.bytes.per.reducer=<number>
INFO  : In order to limit the maximum number of reducers:
INFO  :   set hive.exec.reducers.max=<number>
INFO  : In order to set a constant number of reducers:
INFO  :   set mapreduce.job.reduces=<number>
INFO  : number of splits:2
INFO  : Submitting tokens for job: job_local1041198115_0826
INFO  : The url to track the job: http://localhost:8080/
INFO  : Job running in-process (local Hadoop)
INFO  : 2017-07-29 15:09:25,495 Stage-1 map = 0%,  reduce = 0%
INFO  : 2017-07-29 15:09:32,498 Stage-1 map = 100%,  reduce = 0%
ERROR : Ended Job = job_local1041198115_0826 with errors
ERROR : FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.mr.MapRedTask
INFO  : MapReduce Jobs Launched:
INFO  : Stage-Stage-1:  HDFS Read: 17615502357 HDFS Write: 12907849658 FAIL
INFO  : Total MapReduce CPU Time Spent: 0 msec
INFO  : Completed executing
command(queryId=ubuntu_20170729150909_583df9cf-e54b-44bf-a104-ef5e690cfa0d);
Time taken: 18.314 seconds
Error: Error while processing statement: FAILED: Execution Error, return
code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
(state=08S01,code=2)
java.sql.SQLException: Error while processing statement: FAILED: Execution
Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
        at
org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:292)
        at
org.apache.hive.beeline.Commands.executeInternal(Commands.java:989)
        at org.apache.hive.beeline.Commands.execute(Commands.java:1203)
        at org.apache.hive.beeline.Commands.sql(Commands.java:1117)
        at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1176)
        at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:1010)
        at org.apache.hive.beeline.BeeLine.executeFile(BeeLine.java:987)
        at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:914)
        at
org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:518)
        at org.apache.hive.beeline.BeeLine.main(BeeLine.java:501)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

Closing: 0: jdbc:hive2://localhost:11050/default;auth=none
Error executing file from Hive: load-tpcds-core-hive-generated.sql
Error in /home/ubuntu/Impala/testdata/bin/create-load-data.sh at line 48:
LOAD_DATA_ARGS=""
+ cleanup
+ rm -rf /tmp/tmp.Yfeh8QGfi1




On Sat, Jul 29, 2017 at 12:47 AM, Jim Apple <jbap...@cloudera.com> wrote:

> I'm seeing https://issues.apache.org/jira/browse/IMPALA-5700 when trying
> to bootstrap a new development environment on an EC2 machine with Ubuntu
> 14.04, 250GB of free disk space and over 60GB of free memory. I've seen
> this with and without the -so flag.
>
> I'm running the below script, which I thought was the canonical way to
> bootstrap a development environment. When catalog doesn't start, I don't
> see anything amiss in any of the logs. I was thinking that maybe a port is
> closed that should be open? I only have port 22 open in my ec2 config.
>
> Has anyone else fixed a problem like this before?
>
> #!/bin/bash -eux
>
> IMPALA_REPO_URL=https://git-wip-us.apache.org/repos/asf/
> incubator-impala.git
> IMPALA_REPO_BRANCH=master
>
> sudo apt-get install --yes git
>
> sudo apt-get install --yes openjdk-7-jdk
>
> # JAVA_HOME needed by chef scripts
> export JAVA_HOME="/usr/lib/jvm/$(ls -tr /usr/lib/jvm/ | tail -1)"
> $JAVA_HOME/bin/javac -version
>
> # TODO: check that df . is large enough.
> df -h .
>
> IMPALA_LOCATION=Impala
>
> cd "/home/$(whoami)"
>
> git clone "${IMPALA_REPO_URL}" "${IMPALA_LOCATION}"
> cd "${IMPALA_LOCATION}"
> git checkout "${IMPALA_REPO_BRANCH}"
> GIT_LOG_FILE=$(mktemp)
> git log --pretty=oneline >"${GIT_LOG_FILE}"
> head "${GIT_LOG_FILE}"
>
> ./bin/bootstrap_development.sh
>

Reply via email to