Thanks for keeping this discussion going.

On Sun, Sep 18, 2016 at 8:13 AM, Amos Bird <amosb...@gmail.com> wrote:

>
> > On Fri, Sep 16, 2016 at 9:06 PM, Amos Bird <amosb...@gmail.com> wrote:
> >
> >>
> >> Hi there,
> >>
> >> I followed the wiki
> >> https://cwiki.apache.org/confluence/display/IMPALA/How+
> >> to+load+and+run+Impala+tests
> >> carefully but still have some problems in my local env.
> >>
> >> 1. I need to manually execute "hdfs dfs -mkdir
> /test-warehouse/emptytable"
> >> to get rid of some fe test error.
> >>
> >>
> > Ideally, you should not have to do this. Could you tell me what errors
> you
> > encountered? Sounds like there may be a test or data loading bug we
> should
> > fix.
>
> The error is :
>
> TestLoadData(com.cloudera.impala.analysis.AnalyzeStmtsTest)  Time
> elapsed: 0.033 sec  <<< FAILURE!
> java.lang.AssertionError: got error:
> INPATH location 'hdfs://localhost:20500/test-warehouse/emptytable' does
> not exist.
> expected:
> INPATH location 'hdfs://localhost:20500/test-warehouse/emptytable'
> contains no visible files.
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at com.cloudera.impala.common.FrontendTestBase.AnalysisError(
> FrontendTestBase.java:312)
>   at com.cloudera.impala.common.FrontendTestBase.AnalysisError(
> FrontendTestBase.java:292)
>   at com.cloudera.impala.analysis.AnalyzeStmtsTest.TestLoadData(
> AnalyzeStmtsTest.java:2860)
>
>
Do you have a table functional.emptytable? If yes, then what location is
reported in "show create table"?
Does the directory exist in HDFS?

You could try to manually reload the table and see if the directory is
created:
bin/load-data.py -f -w functional-query --table_names=emptytable
--table_formats=text/none

>>
> >> 2. I have authz-policy.ini in HDFS, but I still get authorization
> errors.
> >>
> >> TestSelect[0](com.cloudera.impala.analysis.AuthorizationTest)  Time
> >> elapsed: 0.333 sec  <<< FAILURE!
> >> java.lang.AssertionError: got error:
> >> User 'amos' does not have privileges to execute 'SELECT' on:
> default.nodb
> >> expected:
> >> User 'amos' does not have privileges to execute 'SELECT' on:
> nodb.alltypes
> >>   at org.junit.Assert.fail(Assert.java:88)
> >>   at org.junit.Assert.assertTrue(Assert.java:41)
> >>   at com.cloudera.impala.analysis.AuthorizationTest.AuthzError(
> >> AuthorizationTest.java:2220)
> >>   at com.cloudera.impala.analysis.AuthorizationTest.AuthzError(
> >> AuthorizationTest.java:2203)
> >>   at com.cloudera.impala.analysis.AuthorizationTest.AuthzError(
> >> AuthorizationTest.java:2197)
> >>   at com.cloudera.impala.analysis.AuthorizationTest.TestSelect(
> >> AuthorizationTest.java:512)
> >>
> >> TestSelect[1](com.cloudera.impala.analysis.AuthorizationTest)  Time
> >> elapsed: 0.324 sec  <<< FAILURE!
> >> java.lang.AssertionError: got error:
> >> User 'amos' does not have privileges to execute 'SELECT' on:
> default.nodb
> >> expected:
> >> User 'amos' does not have privileges to execute 'SELECT' on:
> nodb.alltypes
> >>   at org.junit.Assert.fail(Assert.java:88)
> >>   at org.junit.Assert.assertTrue(Assert.java:41)
> >>   at com.cloudera.impala.analysis.AuthorizationTest.AuthzError(
> >> AuthorizationTest.java:2220)
> >>   at com.cloudera.impala.analysis.AuthorizationTest.AuthzError(
> >> AuthorizationTest.java:2203)
> >>   at com.cloudera.impala.analysis.AuthorizationTest.AuthzError(
> >> AuthorizationTest.java:2197)
> >>   at com.cloudera.impala.analysis.AuthorizationTest.TestSelect(
> >> AuthorizationTest.java:512)
> >>
> >>
> >> Results :
> >>
> >> Failed tests:
> >>   AuthorizationTest.TestSelect:512->AuthzError:2197->
> >> AuthzError:2203->AuthzError:2220 got error:
> >> User 'amos' does not have privileges to execute 'SELECT' on:
> default.nodb
> >> expected:
> >> User 'amos' does not have privileges to execute 'SELECT' on:
> nodb.alltypes
> >>   AuthorizationTest.TestSelect:512->AuthzError:2197->
> >> AuthzError:2203->AuthzError:2220 got error:
> >> User 'amos' does not have privileges to execute 'SELECT' on:
> default.nodb
> >> expected:
> >> User 'amos' does not have privileges to execute 'SELECT' on:
> nodb.alltypes
> >>
> >>
> >>
> > Strange. In this test, we register two authorization requests, and it
> seems
> > like those are not checked in the expected order. However, that should
> not
> > be possible because we store them in a LinkedHashSet.
> > Could you dig into this a little further to see if you can figure out why
> > the order is wrong?
> >
> > This is where we register the authorization requests:
> > https://github.com/cloudera/Impala/blob/cdh5-trunk/fe/src/
> main/java/com/cloudera/impala/analysis/Analyzer.java#L544
> >
> > This is where we check the authorization requests:
> > https://github.com/cloudera/Impala/blob/cdh5-trunk/fe/src/
> main/java/com/cloudera/impala/analysis/AnalysisContext.java#L391
> >
> >
>
> I tried directly executing "select 1 from nodb.alltypes" in
> impala-shell, leading to this error:
> ERROR: AnalysisException: Could not resolve table reference:
> 'nodb.alltypes'
>
> How can I reproduce the authorization tests in impala-shell so I can
> debug it?
>
>
>
FYI, this is actually a known issue and may have something to do with the
JRE version you are running: https://issues.cloudera.org/browse/IMPALA-3643
As far as I can tell the bug should be "impossible" because we use a
LinkedHashSet, but maybe certain JREs do not properly honor the guarantees.

The AuthorizationTests in particular require a non-trivial setup, so I'd
not recommend trying to debug via the Impala shell.

I'd recommend debugging in one of these ways:
- Run the test manually via "mvn test -Dtest=AuthorizationTest" from the FE
directory. Attach debugger and break in TestSelect().
- Run the JUnit test from an IDE such as Eclipse and then debug the test.
I'm afraid there is no easy way to just run that single query in our
current test setup. You will need to run the whole suite, but you can break
TestSelect() or hack the code in various places to set useful breakpoints.

Hope that helps.

>
> >>
> >>
> >> 3. For end-to-end tests, I encountered two kinds of errors
> >>
> >>   a) connection refused.
> >>
> >>   SET sync_ddl=False;
> >> -- executing against localhost:21000
> >> DROP DATABASE `test_drop_cleans_hdfs_dirs_fdfd4f8` CASCADE;
> >>
> >> ___________________ ERROR at setup of TestLoadData.test_load[exec_
> option:
> >> {'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_
> threshold':
> >> 0, 'batch_size': 0, 'num_nodes': 0} | table_format: text/none]
> >> ___________________
> >> [gw5] linux2 -- Python 2.6.6 /home/amos/incubator-impala/
> >> bin/../infra/python/env/bin/python
> >> metadata/test_load.py:77: in setup_method
> >>     "{0}/{1}/100101.txt".format(STAGING_PATH, i))
> >> util/hdfs_util.py:122: in copy
> >>     data = self.read_file(src)
> >> ../infra/python/env/lib/python2.6/site-packages/
> pywebhdfs/webhdfs.py:183:
> >> in read_file
> >>     response = requests.get(uri, allow_redirects=True)
> >> ../infra/python/env/lib/python2.6/site-packages/requests/api.py:69: in
> get
> >>     return request('get', url, params=params, **kwargs)
> >> ../infra/python/env/lib/python2.6/site-packages/requests/api.py:50: in
> >> request
> >>     response = session.request(method=method, url=url, **kwargs)
> >> ../infra/python/env/lib/python2.6/site-packages/
> requests/sessions.py:465:
> >> in request
> >>     resp = self.send(prep, **send_kwargs)
> >> ../infra/python/env/lib/python2.6/site-packages/
> requests/sessions.py:594:
> >> in send
> >>     history = [resp for resp in gen] if allow_redirects else []
> >> ../infra/python/env/lib/python2.6/site-packages/
> requests/sessions.py:196:
> >> in resolve_redirects
> >>     **adapter_kwargs
> >> ../infra/python/env/lib/python2.6/site-packages/
> requests/sessions.py:573:
> >> in send
> >>     r = adapter.send(request, **kwargs)
> >> ../infra/python/env/lib/python2.6/site-packages/
> requests/adapters.py:415:
> >> in send
> >>     raise ConnectionError(err, request=request)
> >> E   ConnectionError: ('Connection aborted.', error(111, 'Connection
> >> refused'))
> >>
> >>
> > The connection refused issue is very bizarre. One thing that I noticed is
> > that your Python does not seem to match what we use (Python 2.7.3).
> > Could you re-run infra/python/bootstrap_virtualenv.py and see if you get
> > the expected version into infra/python/env/local/bin?
> >
> > Alternatively, maybe there's a problem with your /etc/hosts? You can try
> > searching online for WebHdfs and /etc/hosts
> >
>
> well, I find this 'find_py26.py' file under deps. Is it normal?
>

Yes, that's normal. That file looks Python 2.6 on your system but should
not be relevant for running tests because we use the Python from our
virtualenv and not the one on your system.

What's your output when you run "impala-python --version". You should get
'Python 2.7.3".
Also, what's the Python version on your system? Our virtualenv will use
Python 2.6 if your system has a Python < 2.6.
You could try to upgrade your system Python and then
re-run infra/python/bootstrap_virtualenv.py

Still, theoretically Python 2.6 in the virtual env should work. I think
it's more likely you are having a connection problem due to a misconfigured
/etc/hosts.

Are you running the test from a shell that has bin/impala-config.sh and
bin/set-classpath.sh sourced?

To further debug this you could try to specify your namenode address when
running the test to see whether it is somehow picking up a wrong address:
cd tests
./run-tests.py metadata/test_load.py --namenode_http_address=localhost:50070

And see if that works.



> [amos@nobida143 incubator-impala]$ ls infra/python/deps/
> download_requirements  find_py26.py  pip_download.py  requirements.txt
> [amos@nobida143 incubator-impala]$ cat infra/python/deps/download_
> requirements
> #!/bin/bash
>
> # Licensed to the Apache Software Foundation (ASF) under one
> # or more contributor license agreements.  See the NOTICE file
> # distributed with this work for additional information
> # regarding copyright ownership.  The ASF licenses this file
> # to you under the Apache License, Version 2.0 (the
> # "License"); you may not use this file except in compliance
> # with the License.  You may obtain a copy of the License at
> #
> #   http://www.apache.org/licenses/LICENSE-2.0
> #
> # Unless required by applicable law or agreed to in writing,
> # software distributed under the License is distributed on an
> # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
> # KIND, either express or implied.  See the License for the
> # specific language governing permissions and limitations
> # under the License.
>
> set -euo pipefail
>
> DIR="$(dirname "$0")"
>
> pushd "$DIR"
> PY26="$(./find_py26.py)"
> # Directly download packages listed in requirements.txt, but don't install
> them.
> "$PY26" pip_download.py
> # For virtualenv, other scripts rely on the .tar.gz package (not a .whl
> package).
> "$PY26" pip_download.py virtualenv 13.1.0
> # kudu-python is downloaded separately because pip install attempts to
> execute a
> # setup.py subcommand for kudu-python that can fail even if the download
> succeeds.
> "$PY26" pip_download.py kudu-python 0.1.1
> popd
>
>
>
> >    b) stats not match
> >>
> >> [gw4] linux2 -- Python 2.6.6 /home/amos/incubator-impala/
> >> bin/../infra/python/env/bin/python
> >> metadata/test_metadata_query_statements.py:67: in test_show_stats
> >>     self.run_test_case('QueryTest/show-stats', vector, "functional")
> >> common/impala_test_suite.py:342: in run_test_case
> >>     self.__verify_results_and_errors(vector, test_section, result,
> use_db)
> >> common/impala_test_suite.py:234: in __verify_results_and_errors
> >>     replace_filenames_with_placeholder)
> >> common/test_result_verifier.py:398: in verify_raw_results
> >>     VERIFIER_MAP[verifier](expected, actual)
> >> common/test_result_verifier.py:231: in verify_query_result_is_equal
> >>     assert expected_results == actual_results
> >>
> >> ...
> >>
> >>    -- executing against localhost:21000
> >> show column stats alltypes_clone;
> >>
> >> MainThread: Comparing QueryTestResults (expected vs actual):
> >> 'bigint_col','BIGINT',10,-1,8,8 == 'bigint_col','BIGINT',10,-1,8,8
> >> 'bool_col','BOOLEAN',2,-1,1,1 == 'bool_col','BOOLEAN',2,-1,1,1
> >> 'date_string_col','STRING',736,-1,8,8 == 'date_string_col','STRING',
> >> 736,-1,8,8
> >> 'double_col','DOUBLE',-1,-1,8,8 == 'double_col','DOUBLE',-1,-1,8,8
> >> 'float_col','FLOAT',10,-1,4,4 == 'float_col','FLOAT',10,-1,4,4
> >> 'id','INT',7505,-1,4,4 == 'id','INT',7505,-1,4,4
> >> 'int_col','INT',-1,-1,4,4 == 'int_col','INT',-1,-1,4,4
> >> 'month','INT',12,0,4,4 == 'month','INT',12,0,4,4
> >> 'smallint_col','SMALLINT',10,-1,2,2 == 'smallint_col','SMALLINT',10,-
> 1,2,2
> >> 'string_col','STRING',10,-1,-1,-1 == 'string_col','STRING',10,-1,-1,-1
> >> 'timestamp_col','TIMESTAMP',7554,-1,16,16 !=
> 'timestamp_col','TIMESTAMP',
> >> 7552,-1,16,16
> >> 'tinyint_col','TINYINT',10,-1,1,1 == 'tinyint_col','TINYINT',10,-1,1,1
> >> 'year','INT',2,0,4,4 == 'year','INT',2,0,4,4
> >>
> >>
> >> Very strange. Can you do a compute stats on functional.alltypes and
> > confirm that the NDV for timestamp_col are 7552 in your setup?
>
> Yes.
>

I'll need to ask around for help. I have no idea why this is happening.

>
> >
> >
> >
> >> I'm using CentOS 6.8 final. I have no idea what goes wrong. Any help is
> >> much appreciated!
> >
> >
> >
> >
> >>
> >> Best regards,
> >> Amos
> >>
>
>

Reply via email to